Overview
Brought to you by YData
Dataset statistics
| Number of variables | 34 |
|---|---|
| Number of observations | 19717 |
| Missing cells | 221379 |
| Missing cells (%) | 33.0% |
| Duplicate rows | 157 |
| Duplicate rows (%) | 0.8% |
| Total size in memory | 5.1 MiB |
| Average record size in memory | 272.0 B |
Variable types
| Categorical | 16 |
|---|---|
| Text | 18 |
Reproduction
| Analysis started | 2024-11-04 16:12:54.101887 |
|---|---|
| Analysis finished | 2024-11-04 16:13:18.026024 |
| Duration | 23.92 seconds |
| Software version | ydata-profiling vv4.12.0 |
| Download configuration | config.json |
Variables
what_is_your_age_#_years
Categorical
| Distinct | 11 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 154.2 KiB |
| 25-29 | |
|---|---|
| 22-24 | |
| 30-34 | |
| 18-21 | |
| 35-39 | |
| Other values (6) |
Length
| Max length | 5 |
|---|---|
| Median length | 5 |
| Mean length | 4.9898565 |
| Min length | 3 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 22-24 |
|---|---|
| 2nd row | 40-44 |
| 3rd row | 55-59 |
| 4th row | 40-44 |
| 5th row | 22-24 |
Common Values
| Value | Count | Frequency (%) |
| 25-29 | 4458 | |
| 22-24 | 3610 | |
| 30-34 | 3120 | |
| 18-21 | 2502 | |
| 35-39 | 2087 | |
| 40-44 | 1439 | 7.3% |
| 45-49 | 949 | 4.8% |
| 50-54 | 692 | 3.5% |
| 55-59 | 422 | 2.1% |
| 60-69 | 338 | 1.7% |
Length
| Value | Count | Frequency (%) |
| 25-29 | 4458 | |
| 22-24 | 3610 | |
| 30-34 | 3120 | |
| 18-21 | 2502 | |
| 35-39 | 2087 | |
| 40-44 | 1439 | 7.3% |
| 45-49 | 949 | 4.8% |
| 50-54 | 692 | 3.5% |
| 55-59 | 422 | 2.1% |
| 60-69 | 338 | 1.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| 2 | 22248 | |
| - | 19617 | |
| 4 | 13637 | |
| 3 | 10414 | |
| 5 | 10144 | |
| 9 | 8254 | 8.4% |
| 0 | 5689 | 5.8% |
| 1 | 5004 | 5.1% |
| 8 | 2502 | 2.5% |
| 6 | 676 | 0.7% |
| Other values (2) | 200 | 0.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 98385 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 2 | 22248 | |
| - | 19617 | |
| 4 | 13637 | |
| 3 | 10414 | |
| 5 | 10144 | |
| 9 | 8254 | 8.4% |
| 0 | 5689 | 5.8% |
| 1 | 5004 | 5.1% |
| 8 | 2502 | 2.5% |
| 6 | 676 | 0.7% |
| Other values (2) | 200 | 0.2% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 98385 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 2 | 22248 | |
| - | 19617 | |
| 4 | 13637 | |
| 3 | 10414 | |
| 5 | 10144 | |
| 9 | 8254 | 8.4% |
| 0 | 5689 | 5.8% |
| 1 | 5004 | 5.1% |
| 8 | 2502 | 2.5% |
| 6 | 676 | 0.7% |
| Other values (2) | 200 | 0.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 98385 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 2 | 22248 | |
| - | 19617 | |
| 4 | 13637 | |
| 3 | 10414 | |
| 5 | 10144 | |
| 9 | 8254 | 8.4% |
| 0 | 5689 | 5.8% |
| 1 | 5004 | 5.1% |
| 8 | 2502 | 2.5% |
| 6 | 676 | 0.7% |
| Other values (2) | 200 | 0.2% |
what_is_your_gender
Categorical
Imbalance 
| Distinct | 4 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 154.2 KiB |
| Male | |
|---|---|
| Female | |
| Prefer not to say | 318 |
| Prefer to self-describe | 49 |
Length
| Max length | 23 |
|---|---|
| Median length | 4 |
| Mean length | 4.5826951 |
| Min length | 4 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Male |
|---|---|
| 2nd row | Male |
| 3rd row | Female |
| 4th row | Male |
| 5th row | Male |
Common Values
| Value | Count | Frequency (%) |
| Male | 16138 | |
| Female | 3212 | 16.3% |
| Prefer not to say | 318 | 1.6% |
| Prefer to self-describe | 49 | 0.2% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| male | 16138 | |
| female | 3212 | 15.5% |
| prefer | 367 | 1.8% |
| to | 367 | 1.8% |
| not | 318 | 1.5% |
| say | 318 | 1.5% |
| self-describe | 49 | 0.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 23443 | |
| a | 19668 | |
| l | 19399 | |
| M | 16138 | |
| F | 3212 | 3.6% |
| m | 3212 | 3.6% |
| 1052 | 1.2% | |
| r | 783 | 0.9% |
| o | 685 | 0.8% |
| t | 685 | 0.8% |
| Other values (10) | 2080 | 2.3% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 90357 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 23443 | |
| a | 19668 | |
| l | 19399 | |
| M | 16138 | |
| F | 3212 | 3.6% |
| m | 3212 | 3.6% |
| 1052 | 1.2% | |
| r | 783 | 0.9% |
| o | 685 | 0.8% |
| t | 685 | 0.8% |
| Other values (10) | 2080 | 2.3% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 90357 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 23443 | |
| a | 19668 | |
| l | 19399 | |
| M | 16138 | |
| F | 3212 | 3.6% |
| m | 3212 | 3.6% |
| 1052 | 1.2% | |
| r | 783 | 0.9% |
| o | 685 | 0.8% |
| t | 685 | 0.8% |
| Other values (10) | 2080 | 2.3% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 90357 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 23443 | |
| a | 19668 | |
| l | 19399 | |
| M | 16138 | |
| F | 3212 | 3.6% |
| m | 3212 | 3.6% |
| 1052 | 1.2% | |
| r | 783 | 0.9% |
| o | 685 | 0.8% |
| t | 685 | 0.8% |
| Other values (10) | 2080 | 2.3% |
| Distinct | 59 |
|---|---|
| Distinct (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 154.2 KiB |
Length
| Max length | 52 |
|---|---|
| Median length | 28 |
| Mean length | 10.232642 |
| Min length | 4 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | France |
|---|---|
| 2nd row | India |
| 3rd row | Germany |
| 4th row | Australia |
| 5th row | India |
| Value | Count | Frequency (%) |
| india | 4786 | |
| of | 3736 | 11.2% |
| united | 3567 | 10.6% |
| states | 3085 | 9.2% |
| america | 3085 | 9.2% |
| other | 1054 | 3.1% |
| brazil | 728 | 2.2% |
| japan | 673 | 2.0% |
| russia | 626 | 1.9% |
| china | 574 | 1.7% |
| Other values (63) | 11583 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 24865 | |
| i | 19357 | 9.6% |
| e | 17291 | 8.6% |
| n | 16760 | 8.3% |
| t | 14087 | 7.0% |
| 13780 | 6.8% | |
| d | 11371 | 5.6% |
| r | 11349 | 5.6% |
| o | 7043 | 3.5% |
| I | 6091 | 3.0% |
| Other values (39) | 59763 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 201757 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| a | 24865 | |
| i | 19357 | 9.6% |
| e | 17291 | 8.6% |
| n | 16760 | 8.3% |
| t | 14087 | 7.0% |
| 13780 | 6.8% | |
| d | 11371 | 5.6% |
| r | 11349 | 5.6% |
| o | 7043 | 3.5% |
| I | 6091 | 3.0% |
| Other values (39) | 59763 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 201757 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| a | 24865 | |
| i | 19357 | 9.6% |
| e | 17291 | 8.6% |
| n | 16760 | 8.3% |
| t | 14087 | 7.0% |
| 13780 | 6.8% | |
| d | 11371 | 5.6% |
| r | 11349 | 5.6% |
| o | 7043 | 3.5% |
| I | 6091 | 3.0% |
| Other values (39) | 59763 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 201757 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| a | 24865 | |
| i | 19357 | 9.6% |
| e | 17291 | 8.6% |
| n | 16760 | 8.3% |
| t | 14087 | 7.0% |
| 13780 | 6.8% | |
| d | 11371 | 5.6% |
| r | 11349 | 5.6% |
| o | 7043 | 3.5% |
| I | 6091 | 3.0% |
| Other values (39) | 59763 |
Missing 
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 394 |
| Missing (%) | 2.0% |
| Memory size | 154.2 KiB |
| Master’s degree | |
|---|---|
| Bachelor’s degree | |
| Doctoral degree | |
| Some college/university study without earning a bachelor’s degree | 837 |
| Professional degree | 611 |
| Other values (2) | 566 |
Length
| Max length | 65 |
|---|---|
| Median length | 15 |
| Mean length | 18.286446 |
| Min length | 15 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Master’s degree |
|---|---|
| 2nd row | Professional degree |
| 3rd row | Professional degree |
| 4th row | Master’s degree |
| 5th row | Bachelor’s degree |
Common Values
| Value | Count | Frequency (%) |
| Master’s degree | 8549 | |
| Bachelor’s degree | 5993 | |
| Doctoral degree | 2767 | 14.0% |
| Some college/university study without earning a bachelor’s degree | 837 | 4.2% |
| Professional degree | 611 | 3.1% |
| I prefer not to answer | 333 | 1.7% |
| No formal education past high school | 233 | 1.2% |
| (Missing) | 394 | 2.0% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| degree | 18757 | |
| master’s | 8549 | |
| bachelor’s | 6830 | 15.0% |
| doctoral | 2767 | 6.1% |
| some | 837 | 1.8% |
| college/university | 837 | 1.8% |
| study | 837 | 1.8% |
| without | 837 | 1.8% |
| earning | 837 | 1.8% |
| a | 837 | 1.8% |
| Other values (12) | 3674 | 8.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 77678 | |
| r | 40420 | |
| s | 27623 | 7.8% |
| 26276 | 7.4% | |
| a | 21463 | 6.1% |
| g | 20664 | 5.8% |
| d | 19827 | 5.6% |
| o | 17928 | 5.1% |
| t | 15796 | 4.5% |
| ’ | 15379 | 4.4% |
| Other values (21) | 70295 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 353349 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 77678 | |
| r | 40420 | |
| s | 27623 | 7.8% |
| 26276 | 7.4% | |
| a | 21463 | 6.1% |
| g | 20664 | 5.8% |
| d | 19827 | 5.6% |
| o | 17928 | 5.1% |
| t | 15796 | 4.5% |
| ’ | 15379 | 4.4% |
| Other values (21) | 70295 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 353349 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 77678 | |
| r | 40420 | |
| s | 27623 | 7.8% |
| 26276 | 7.4% | |
| a | 21463 | 6.1% |
| g | 20664 | 5.8% |
| d | 19827 | 5.6% |
| o | 17928 | 5.1% |
| t | 15796 | 4.5% |
| ’ | 15379 | 4.4% |
| Other values (21) | 70295 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 353349 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 77678 | |
| r | 40420 | |
| s | 27623 | 7.8% |
| 26276 | 7.4% | |
| a | 21463 | 6.1% |
| g | 20664 | 5.8% |
| d | 19827 | 5.6% |
| o | 17928 | 5.1% |
| t | 15796 | 4.5% |
| ’ | 15379 | 4.4% |
| Other values (21) | 70295 |
select_the_title_most_similar_to_your_current_role_or_most_recent_title_if_retired
Categorical
Missing 
| Distinct | 12 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 610 |
| Missing (%) | 3.1% |
| Memory size | 154.2 KiB |
| Data Scientist | |
|---|---|
| Student | |
| Software Engineer | |
| Other | |
| Data Analyst | |
| Other values (7) |
Length
| Max length | 23 |
|---|---|
| Median length | 18 |
| Mean length | 12.61276 |
| Min length | 5 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Software Engineer |
|---|---|
| 2nd row | Software Engineer |
| 3rd row | Other |
| 4th row | Other |
| 5th row | Data Scientist |
Common Values
| Value | Count | Frequency (%) |
| Data Scientist | 4085 | |
| Student | 4014 | |
| Software Engineer | 2705 | |
| Other | 1690 | |
| Data Analyst | 1598 | 8.1% |
| Research Scientist | 1470 | 7.5% |
| Not employed | 942 | 4.8% |
| Business Analyst | 778 | 3.9% |
| Product/Project Manager | 723 | 3.7% |
| Data Engineer | 624 | 3.2% |
| Other values (2) | 478 | 2.4% |
| (Missing) | 610 | 3.1% |
Length
| Value | Count | Frequency (%) |
| data | 6307 | |
| scientist | 5555 | |
| student | 4014 | |
| engineer | 3485 | |
| software | 2705 | |
| analyst | 2376 | 7.4% |
| other | 1690 | 5.3% |
| research | 1470 | 4.6% |
| not | 942 | 2.9% |
| employed | 942 | 2.9% |
| Other values (5) | 2702 |
Most occurring characters
| Value | Count | Frequency (%) |
| t | 35726 | |
| e | 28138 | |
| a | 21723 | 9.0% |
| n | 20738 | 8.6% |
| i | 16339 | 6.8% |
| 13081 | 5.4% | |
| S | 12596 | 5.2% |
| s | 12213 | 5.1% |
| r | 11519 | 4.8% |
| c | 8793 | 3.6% |
| Other values (23) | 60126 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 240992 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| t | 35726 | |
| e | 28138 | |
| a | 21723 | 9.0% |
| n | 20738 | 8.6% |
| i | 16339 | 6.8% |
| 13081 | 5.4% | |
| S | 12596 | 5.2% |
| s | 12213 | 5.1% |
| r | 11519 | 4.8% |
| c | 8793 | 3.6% |
| Other values (23) | 60126 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 240992 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| t | 35726 | |
| e | 28138 | |
| a | 21723 | 9.0% |
| n | 20738 | 8.6% |
| i | 16339 | 6.8% |
| 13081 | 5.4% | |
| S | 12596 | 5.2% |
| s | 12213 | 5.1% |
| r | 11519 | 4.8% |
| c | 8793 | 3.6% |
| Other values (23) | 60126 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 240992 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| t | 35726 | |
| e | 28138 | |
| a | 21723 | 9.0% |
| n | 20738 | 8.6% |
| i | 16339 | 6.8% |
| 13081 | 5.4% | |
| S | 12596 | 5.2% |
| s | 12213 | 5.1% |
| r | 11519 | 4.8% |
| c | 8793 | 3.6% |
| Other values (23) | 60126 |
what_is_the_size_of_the_company_where_you_are_employed
Categorical
Missing 
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 5715 |
| Missing (%) | 29.0% |
| Memory size | 154.2 KiB |
| 0-49 employees | |
|---|---|
| > 10,000 employees | |
| 1000-9,999 employees | |
| 50-249 employees | |
| 250-999 employees |
Length
| Max length | 20 |
|---|---|
| Median length | 18 |
| Mean length | 16.76282 |
| Min length | 14 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1000-9,999 employees |
|---|---|
| 2nd row | > 10,000 employees |
| 3rd row | > 10,000 employees |
| 4th row | 0-49 employees |
| 5th row | 0-49 employees |
Common Values
| Value | Count | Frequency (%) |
| 0-49 employees | 4025 | |
| > 10,000 employees | 3160 | |
| 1000-9,999 employees | 2641 | |
| 50-249 employees | 2329 | |
| 250-999 employees | 1847 | 9.4% |
| (Missing) | 5715 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| employees | 14002 | |
| 0-49 | 4025 | 12.9% |
| 3160 | 10.1% | |
| 10,000 | 3160 | 10.1% |
| 1000-9,999 | 2641 | 8.5% |
| 50-249 | 2329 | 7.5% |
| 250-999 | 1847 | 5.9% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 42006 | |
| 0 | 28764 | |
| 9 | 22459 | |
| 17162 | ||
| o | 14002 | 6.0% |
| s | 14002 | 6.0% |
| y | 14002 | 6.0% |
| l | 14002 | 6.0% |
| p | 14002 | 6.0% |
| m | 14002 | 6.0% |
| Other values (7) | 40310 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 234713 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 42006 | |
| 0 | 28764 | |
| 9 | 22459 | |
| 17162 | ||
| o | 14002 | 6.0% |
| s | 14002 | 6.0% |
| y | 14002 | 6.0% |
| l | 14002 | 6.0% |
| p | 14002 | 6.0% |
| m | 14002 | 6.0% |
| Other values (7) | 40310 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 234713 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 42006 | |
| 0 | 28764 | |
| 9 | 22459 | |
| 17162 | ||
| o | 14002 | 6.0% |
| s | 14002 | 6.0% |
| y | 14002 | 6.0% |
| l | 14002 | 6.0% |
| p | 14002 | 6.0% |
| m | 14002 | 6.0% |
| Other values (7) | 40310 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 234713 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 42006 | |
| 0 | 28764 | |
| 9 | 22459 | |
| 17162 | ||
| o | 14002 | 6.0% |
| s | 14002 | 6.0% |
| y | 14002 | 6.0% |
| l | 14002 | 6.0% |
| p | 14002 | 6.0% |
| m | 14002 | 6.0% |
| Other values (7) | 40310 |
Missing 
| Distinct | 7 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 6094 |
| Missing (%) | 30.9% |
| Memory size | 154.2 KiB |
| 20+ | |
|---|---|
| 1-2 | |
| 3-4 | |
| 0 | |
| 5-9 | |
| Other values (2) |
Length
| Max length | 5 |
|---|---|
| Median length | 3 |
| Mean length | 2.9286501 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 0 |
|---|---|
| 2nd row | 20+ |
| 3rd row | 20+ |
| 4th row | 0 |
| 5th row | 3-4 |
Common Values
| Value | Count | Frequency (%) |
| 20+ | 3178 | |
| 1-2 | 3005 | |
| 3-4 | 2319 | 11.8% |
| 0 | 1880 | 9.5% |
| 5-9 | 1847 | 9.4% |
| 10-14 | 967 | 4.9% |
| 15-19 | 427 | 2.2% |
| (Missing) | 6094 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| 20 | 3178 | |
| 1-2 | 3005 | |
| 3-4 | 2319 | |
| 0 | 1880 | |
| 5-9 | 1847 | |
| 10-14 | 967 | 7.1% |
| 15-19 | 427 | 3.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| - | 8565 | |
| 2 | 6183 | |
| 0 | 6025 | |
| 1 | 5793 | |
| 4 | 3286 | 8.2% |
| + | 3178 | 8.0% |
| 3 | 2319 | 5.8% |
| 5 | 2274 | 5.7% |
| 9 | 2274 | 5.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 39897 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| - | 8565 | |
| 2 | 6183 | |
| 0 | 6025 | |
| 1 | 5793 | |
| 4 | 3286 | 8.2% |
| + | 3178 | 8.0% |
| 3 | 2319 | 5.8% |
| 5 | 2274 | 5.7% |
| 9 | 2274 | 5.7% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 39897 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| - | 8565 | |
| 2 | 6183 | |
| 0 | 6025 | |
| 1 | 5793 | |
| 4 | 3286 | 8.2% |
| + | 3178 | 8.0% |
| 3 | 2319 | 5.8% |
| 5 | 2274 | 5.7% |
| 9 | 2274 | 5.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 39897 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| - | 8565 | |
| 2 | 6183 | |
| 0 | 6025 | |
| 1 | 5793 | |
| 4 | 3286 | 8.2% |
| + | 3178 | 8.0% |
| 3 | 2319 | 5.8% |
| 5 | 2274 | 5.7% |
| 9 | 2274 | 5.7% |
does_your_current_employer_incorporate_machine_learning_methods_into_their_business
Categorical
Missing 
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 6490 |
| Missing (%) | 32.9% |
| Memory size | 154.2 KiB |
| We are exploring ML methods (and may one day put a model into production) | |
|---|---|
| We recently started using ML methods (i.e., models in production for less than 2 years) | |
| We have well established ML methods (i.e., models in production for more than 2 years) | |
| No (we do not use ML methods) | |
| We use ML methods for generating insights (but do not put working models into production) |
Length
| Max length | 89 |
|---|---|
| Median length | 86 |
| Mean length | 66.814017 |
| Min length | 13 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | I do not know |
|---|---|
| 2nd row | We have well established ML methods (i.e., models in production for more than 2 years) |
| 3rd row | I do not know |
| 4th row | No (we do not use ML methods) |
| 5th row | We have well established ML methods (i.e., models in production for more than 2 years) |
Common Values
| Value | Count | Frequency (%) |
| We are exploring ML methods (and may one day put a model into production) | 2812 | |
| We recently started using ML methods (i.e., models in production for less than 2 years) | 2731 | |
| We have well established ML methods (i.e., models in production for more than 2 years) | 2528 | 12.8% |
| No (we do not use ML methods) | 2415 | 12.2% |
| We use ML methods for generating insights (but do not put working models into production) | 1550 | 7.9% |
| I do not know | 1191 | 6.0% |
| (Missing) | 6490 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| we | 12036 | 7.4% |
| ml | 12036 | 7.4% |
| methods | 12036 | 7.4% |
| production | 9621 | 5.9% |
| for | 6809 | 4.2% |
| models | 6809 | 4.2% |
| years | 5259 | 3.2% |
| 2 | 5259 | 3.2% |
| than | 5259 | 3.2% |
| i.e | 5259 | 3.2% |
| Other values (29) | 82789 |
Most occurring characters
| Value | Count | Frequency (%) |
| 149945 | ||
| e | 83276 | 9.4% |
| o | 75690 | 8.6% |
| t | 56167 | 6.4% |
| n | 50946 | 5.8% |
| d | 47317 | 5.4% |
| s | 47149 | 5.3% |
| i | 38772 | 4.4% |
| r | 38403 | 4.3% |
| a | 33915 | 3.8% |
| Other values (24) | 262169 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 883749 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 149945 | ||
| e | 83276 | 9.4% |
| o | 75690 | 8.6% |
| t | 56167 | 6.4% |
| n | 50946 | 5.8% |
| d | 47317 | 5.4% |
| s | 47149 | 5.3% |
| i | 38772 | 4.4% |
| r | 38403 | 4.3% |
| a | 33915 | 3.8% |
| Other values (24) | 262169 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 883749 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 149945 | ||
| e | 83276 | 9.4% |
| o | 75690 | 8.6% |
| t | 56167 | 6.4% |
| n | 50946 | 5.8% |
| d | 47317 | 5.4% |
| s | 47149 | 5.3% |
| i | 38772 | 4.4% |
| r | 38403 | 4.3% |
| a | 33915 | 3.8% |
| Other values (24) | 262169 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 883749 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 149945 | ||
| e | 83276 | 9.4% |
| o | 75690 | 8.6% |
| t | 56167 | 6.4% |
| n | 50946 | 5.8% |
| d | 47317 | 5.4% |
| s | 47149 | 5.3% |
| i | 38772 | 4.4% |
| r | 38403 | 4.3% |
| a | 33915 | 3.8% |
| Other values (24) | 262169 |
what_is_your_current_yearly_compensation_approximate_$_usd
Categorical
Missing 
| Distinct | 25 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 7220 |
| Missing (%) | 36.6% |
| Memory size | 154.2 KiB |
| $0-999 | |
|---|---|
| 10,000-14,999 | |
| 100,000-124,999 | 750 |
| 30,000-39,999 | 728 |
| 40,000-49,999 | 719 |
| Other values (20) |
Length
| Max length | 15 |
|---|---|
| Median length | 13 |
| Mean length | 12.04361 |
| Min length | 6 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 30,000-39,999 |
|---|---|
| 2nd row | 5,000-7,499 |
| 3rd row | 250,000-299,999 |
| 4th row | 4,000-4,999 |
| 5th row | 60,000-69,999 |
Common Values
| Value | Count | Frequency (%) |
| $0-999 | 1513 | 7.7% |
| 10,000-14,999 | 833 | 4.2% |
| 100,000-124,999 | 750 | 3.8% |
| 30,000-39,999 | 728 | 3.7% |
| 40,000-49,999 | 719 | 3.6% |
| 50,000-59,999 | 704 | 3.6% |
| 1,000-1,999 | 599 | 3.0% |
| 60,000-69,999 | 576 | 2.9% |
| 5,000-7,499 | 536 | 2.7% |
| 15,000-19,999 | 529 | 2.7% |
| Other values (15) | 5010 | |
| (Missing) | 7220 |
Length
| Value | Count | Frequency (%) |
| 0-999 | 1513 | 12.0% |
| 10,000-14,999 | 833 | 6.6% |
| 100,000-124,999 | 750 | 6.0% |
| 30,000-39,999 | 728 | 5.8% |
| 40,000-49,999 | 719 | 5.7% |
| 50,000-59,999 | 704 | 5.6% |
| 1,000-1,999 | 599 | 4.8% |
| 60,000-69,999 | 576 | 4.6% |
| 5,000-7,499 | 536 | 4.3% |
| 15,000-19,999 | 529 | 4.2% |
| Other values (16) | 5093 |
Most occurring characters
| Value | Count | Frequency (%) |
| 9 | 44336 | |
| 0 | 42462 | |
| , | 21885 | |
| - | 12414 | 8.2% |
| 1 | 7256 | 4.8% |
| 4 | 5309 | 3.5% |
| 5 | 4502 | 3.0% |
| 2 | 4489 | 3.0% |
| 3 | 2140 | 1.4% |
| 7 | 1992 | 1.3% |
| Other values (5) | 3724 | 2.5% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 150509 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 9 | 44336 | |
| 0 | 42462 | |
| , | 21885 | |
| - | 12414 | 8.2% |
| 1 | 7256 | 4.8% |
| 4 | 5309 | 3.5% |
| 5 | 4502 | 3.0% |
| 2 | 4489 | 3.0% |
| 3 | 2140 | 1.4% |
| 7 | 1992 | 1.3% |
| Other values (5) | 3724 | 2.5% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 150509 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 9 | 44336 | |
| 0 | 42462 | |
| , | 21885 | |
| - | 12414 | 8.2% |
| 1 | 7256 | 4.8% |
| 4 | 5309 | 3.5% |
| 5 | 4502 | 3.0% |
| 2 | 4489 | 3.0% |
| 3 | 2140 | 1.4% |
| 7 | 1992 | 1.3% |
| Other values (5) | 3724 | 2.5% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 150509 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 9 | 44336 | |
| 0 | 42462 | |
| , | 21885 | |
| - | 12414 | 8.2% |
| 1 | 7256 | 4.8% |
| 4 | 5309 | 3.5% |
| 5 | 4502 | 3.0% |
| 2 | 4489 | 3.0% |
| 3 | 2140 | 1.4% |
| 7 | 1992 | 1.3% |
| Other values (5) | 3724 | 2.5% |
Missing 
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 7467 |
| Missing (%) | 37.9% |
| Memory size | 154.2 KiB |
| $0 (USD) | |
|---|---|
| $100-$999 | |
| $1000-$9,999 | |
| $1-$99 | |
| $10,000-$99,999 |
Length
| Max length | 17 |
|---|---|
| Median length | 15 |
| Mean length | 10.101388 |
| Min length | 6 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | $0 (USD) |
|---|---|
| 2nd row | > $100,000 ($USD) |
| 3rd row | $10,000-$99,999 |
| 4th row | $0 (USD) |
| 5th row | $10,000-$99,999 |
Common Values
| Value | Count | Frequency (%) |
| $0 (USD) | 4038 | |
| $100-$999 | 2335 | 11.8% |
| $1000-$9,999 | 2123 | 10.8% |
| $1-$99 | 1485 | 7.5% |
| $10,000-$99,999 | 1268 | 6.4% |
| > $100,000 ($USD) | 1001 | 5.1% |
| (Missing) | 7467 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| usd | 5039 | |
| 0 | 4038 | |
| 100-$999 | 2335 | |
| 1000-$9,999 | 2123 | |
| 1-$99 | 1485 | 8.1% |
| 10,000-$99,999 | 1268 | 6.9% |
| 1001 | 5.5% | |
| 100,000 | 1001 | 5.5% |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 25154 | |
| 9 | 24807 | |
| $ | 20462 | |
| 1 | 8212 | 6.6% |
| - | 7211 | 5.8% |
| 6040 | 4.9% | |
| , | 5660 | 4.6% |
| ( | 5039 | 4.1% |
| U | 5039 | 4.1% |
| S | 5039 | 4.1% |
| Other values (3) | 11079 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 123742 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 0 | 25154 | |
| 9 | 24807 | |
| $ | 20462 | |
| 1 | 8212 | 6.6% |
| - | 7211 | 5.8% |
| 6040 | 4.9% | |
| , | 5660 | 4.6% |
| ( | 5039 | 4.1% |
| U | 5039 | 4.1% |
| S | 5039 | 4.1% |
| Other values (3) | 11079 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 123742 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 0 | 25154 | |
| 9 | 24807 | |
| $ | 20462 | |
| 1 | 8212 | 6.6% |
| - | 7211 | 5.8% |
| 6040 | 4.9% | |
| , | 5660 | 4.6% |
| ( | 5039 | 4.1% |
| U | 5039 | 4.1% |
| S | 5039 | 4.1% |
| Other values (3) | 11079 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 123742 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 0 | 25154 | |
| 9 | 24807 | |
| $ | 20462 | |
| 1 | 8212 | 6.6% |
| - | 7211 | 5.8% |
| 6040 | 4.9% | |
| , | 5660 | 4.6% |
| ( | 5039 | 4.1% |
| U | 5039 | 4.1% |
| S | 5039 | 4.1% |
| Other values (3) | 11079 |
| Distinct | 4975 |
|---|---|
| Distinct (%) | 25.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 154.2 KiB |
Length
| Max length | 89 |
|---|---|
| Median length | 86 |
| Mean length | 63.65938 |
| Min length | 18 |
Unique
| Unique | 4355 ? |
|---|---|
| Unique (%) | 22.1% |
Sample
| 1st row | Basic statistical software (Microsoft Excel, Google Sheets, etc.), 0, -1, -1, -1, -1 |
|---|---|
| 2nd row | Cloud-based data software & APIs (AWS, GCP, Azure, etc.), -1, -1, -1, -1, 0 |
| 3rd row | -1, -1, -1, -1, -1 |
| 4th row | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 0, -1 |
| 5th row | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 1, -1 |
| Value | Count | Frequency (%) |
| 1 | 85278 | |
| etc | 14500 | 7.3% |
| local | 8475 | 4.3% |
| development | 8475 | 4.3% |
| environments | 8475 | 4.3% |
| rstudio | 8475 | 4.3% |
| jupyterlab | 8475 | 4.3% |
| software | 6025 | 3.1% |
| statistical | 3956 | 2.0% |
| excel | 3061 | 1.6% |
| Other values (2861) | 42147 |
Most occurring characters
| Value | Count | Frequency (%) |
| 177625 | ||
| , | 125627 | 10.0% |
| e | 95128 | 7.6% |
| 1 | 90608 | 7.2% |
| - | 85279 | 6.8% |
| t | 76555 | 6.1% |
| o | 55119 | 4.4% |
| a | 41050 | 3.3% |
| c | 38771 | 3.1% |
| s | 37495 | 3.0% |
| Other values (45) | 431915 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 1255172 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 177625 | ||
| , | 125627 | 10.0% |
| e | 95128 | 7.6% |
| 1 | 90608 | 7.2% |
| - | 85279 | 6.8% |
| t | 76555 | 6.1% |
| o | 55119 | 4.4% |
| a | 41050 | 3.3% |
| c | 38771 | 3.1% |
| s | 37495 | 3.0% |
| Other values (45) | 431915 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 1255172 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 177625 | ||
| , | 125627 | 10.0% |
| e | 95128 | 7.6% |
| 1 | 90608 | 7.2% |
| - | 85279 | 6.8% |
| t | 76555 | 6.1% |
| o | 55119 | 4.4% |
| a | 41050 | 3.3% |
| c | 38771 | 3.1% |
| s | 37495 | 3.0% |
| Other values (45) | 431915 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 1255172 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 177625 | ||
| , | 125627 | 10.0% |
| e | 95128 | 7.6% |
| 1 | 90608 | 7.2% |
| - | 85279 | 6.8% |
| t | 76555 | 6.1% |
| o | 55119 | 4.4% |
| a | 41050 | 3.3% |
| c | 38771 | 3.1% |
| s | 37495 | 3.0% |
| Other values (45) | 431915 |
how_long_have_you_been_writing_code_to_analyze_data_at_work_or_at_school
Categorical
Missing 
| Distinct | 7 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 4090 |
| Missing (%) | 20.7% |
| Memory size | 154.2 KiB |
| 1-2 years | |
|---|---|
| < 1 years | |
| 3-5 years | |
| 5-10 years | |
| 10-20 years | |
| Other values (2) |
Length
| Max length | 25 |
|---|---|
| Median length | 9 |
| Mean length | 10.140142 |
| Min length | 9 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1-2 years |
|---|---|
| 2nd row | I have never written code |
| 3rd row | 1-2 years |
| 4th row | < 1 years |
| 5th row | 20+ years |
Common Values
| Value | Count | Frequency (%) |
| 1-2 years | 4061 | |
| < 1 years | 3828 | |
| 3-5 years | 3365 | |
| 5-10 years | 1887 | |
| 10-20 years | 1045 | 5.3% |
| I have never written code | 865 | 4.4% |
| 20+ years | 576 | 2.9% |
| (Missing) | 4090 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| years | 14762 | |
| 1-2 | 4061 | 10.8% |
| 3828 | 10.2% | |
| 1 | 3828 | 10.2% |
| 3-5 | 3365 | 8.9% |
| 5-10 | 1887 | 5.0% |
| 10-20 | 1045 | 2.8% |
| i | 865 | 2.3% |
| have | 865 | 2.3% |
| never | 865 | 2.3% |
| Other values (3) | 2306 | 6.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 22050 | ||
| e | 19087 | |
| r | 16492 | |
| a | 15627 | |
| y | 14762 | |
| s | 14762 | |
| 1 | 10821 | |
| - | 10358 | |
| 2 | 5682 | 3.6% |
| 5 | 5252 | 3.3% |
| Other values (14) | 23567 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 158460 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 22050 | ||
| e | 19087 | |
| r | 16492 | |
| a | 15627 | |
| y | 14762 | |
| s | 14762 | |
| 1 | 10821 | |
| - | 10358 | |
| 2 | 5682 | 3.6% |
| 5 | 5252 | 3.3% |
| Other values (14) | 23567 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 158460 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 22050 | ||
| e | 19087 | |
| r | 16492 | |
| a | 15627 | |
| y | 14762 | |
| s | 14762 | |
| 1 | 10821 | |
| - | 10358 | |
| 2 | 5682 | 3.6% |
| 5 | 5252 | 3.3% |
| Other values (14) | 23567 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 158460 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 22050 | ||
| e | 19087 | |
| r | 16492 | |
| a | 15627 | |
| y | 14762 | |
| s | 14762 | |
| 1 | 10821 | |
| - | 10358 | |
| 2 | 5682 | 3.6% |
| 5 | 5252 | 3.3% |
| Other values (14) | 23567 |
what_programming_language_would_you_recommend_an_aspiring_data_scientist_to_learn_first
Categorical
Imbalance  Missing 
| Distinct | 12 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 5340 |
| Missing (%) | 27.1% |
| Memory size | 154.2 KiB |
| Python | |
|---|---|
| R | |
| SQL | 817 |
| C++ | 199 |
| MATLAB | 162 |
| Other values (7) | 540 |
Length
| Max length | 10 |
|---|---|
| Median length | 6 |
| Mean length | 5.2444182 |
| Min length | 1 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Python |
|---|---|
| 2nd row | Python |
| 3rd row | Python |
| 4th row | Java |
| 5th row | Python |
Common Values
| Value | Count | Frequency (%) |
| Python | 11316 | |
| R | 1343 | 6.8% |
| SQL | 817 | 4.1% |
| C++ | 199 | 1.0% |
| MATLAB | 162 | 0.8% |
| C | 153 | 0.8% |
| Other | 127 | 0.6% |
| Java | 104 | 0.5% |
| None | 69 | 0.3% |
| Javascript | 47 | 0.2% |
| Other values (2) | 40 | 0.2% |
| (Missing) | 5340 |
Length
| Value | Count | Frequency (%) |
| python | 11316 | |
| r | 1343 | 9.3% |
| sql | 817 | 5.7% |
| c | 352 | 2.4% |
| matlab | 162 | 1.1% |
| other | 127 | 0.9% |
| java | 104 | 0.7% |
| none | 69 | 0.5% |
| javascript | 47 | 0.3% |
| bash | 35 | 0.2% |
Most occurring characters
| Value | Count | Frequency (%) |
| t | 11495 | |
| h | 11478 | |
| o | 11385 | |
| n | 11385 | |
| y | 11321 | |
| P | 11316 | |
| R | 1343 | 1.8% |
| L | 979 | 1.3% |
| S | 822 | 1.1% |
| Q | 817 | 1.1% |
| Other values (17) | 3058 | 4.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 75399 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| t | 11495 | |
| h | 11478 | |
| o | 11385 | |
| n | 11385 | |
| y | 11321 | |
| P | 11316 | |
| R | 1343 | 1.8% |
| L | 979 | 1.3% |
| S | 822 | 1.1% |
| Q | 817 | 1.1% |
| Other values (17) | 3058 | 4.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 75399 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| t | 11495 | |
| h | 11478 | |
| o | 11385 | |
| n | 11385 | |
| y | 11321 | |
| P | 11316 | |
| R | 1343 | 1.8% |
| L | 979 | 1.3% |
| S | 822 | 1.1% |
| Q | 817 | 1.1% |
| Other values (17) | 3058 | 4.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 75399 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| t | 11495 | |
| h | 11478 | |
| o | 11385 | |
| n | 11385 | |
| y | 11321 | |
| P | 11316 | |
| R | 1343 | 1.8% |
| L | 979 | 1.3% |
| S | 822 | 1.1% |
| Q | 817 | 1.1% |
| Other values (17) | 3058 | 4.1% |
have_you_ever_used_a_tpu_tensor_processing_unit
Categorical
Imbalance  Missing 
| Distinct | 5 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 5514 |
| Missing (%) | 28.0% |
| Memory size | 154.2 KiB |
| Never | |
|---|---|
| Once | |
| 2-5 times | 1037 |
| 6-24 times | 193 |
| > 25 times | 158 |
Length
| Max length | 10 |
|---|---|
| Median length | 5 |
| Mean length | 5.3226783 |
| Min length | 4 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | Never |
|---|---|
| 2nd row | Once |
| 3rd row | Never |
| 4th row | Never |
| 5th row | 6-24 times |
Common Values
| Value | Count | Frequency (%) |
| Never | 11495 | |
| Once | 1320 | 6.7% |
| 2-5 times | 1037 | 5.3% |
| 6-24 times | 193 | 1.0% |
| > 25 times | 158 | 0.8% |
| (Missing) | 5514 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| never | 11495 | |
| times | 1388 | 8.8% |
| once | 1320 | 8.4% |
| 2-5 | 1037 | 6.6% |
| 6-24 | 193 | 1.2% |
| 158 | 1.0% | |
| 25 | 158 | 1.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 25698 | |
| N | 11495 | |
| v | 11495 | |
| r | 11495 | |
| 1546 | 2.0% | |
| s | 1388 | 1.8% |
| 2 | 1388 | 1.8% |
| t | 1388 | 1.8% |
| i | 1388 | 1.8% |
| m | 1388 | 1.8% |
| Other values (8) | 6929 | 9.2% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 75598 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 25698 | |
| N | 11495 | |
| v | 11495 | |
| r | 11495 | |
| 1546 | 2.0% | |
| s | 1388 | 1.8% |
| 2 | 1388 | 1.8% |
| t | 1388 | 1.8% |
| i | 1388 | 1.8% |
| m | 1388 | 1.8% |
| Other values (8) | 6929 | 9.2% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 75598 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 25698 | |
| N | 11495 | |
| v | 11495 | |
| r | 11495 | |
| 1546 | 2.0% | |
| s | 1388 | 1.8% |
| 2 | 1388 | 1.8% |
| t | 1388 | 1.8% |
| i | 1388 | 1.8% |
| m | 1388 | 1.8% |
| Other values (8) | 6929 | 9.2% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 75598 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 25698 | |
| N | 11495 | |
| v | 11495 | |
| r | 11495 | |
| 1546 | 2.0% | |
| s | 1388 | 1.8% |
| 2 | 1388 | 1.8% |
| t | 1388 | 1.8% |
| i | 1388 | 1.8% |
| m | 1388 | 1.8% |
| Other values (8) | 6929 | 9.2% |
for_how_many_years_have_you_used_machine_learning_methods
Categorical
Missing 
| Distinct | 8 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 5535 |
| Missing (%) | 28.1% |
| Memory size | 154.2 KiB |
| < 1 years | |
|---|---|
| 1-2 years | |
| 2-3 years | |
| 3-4 years | |
| 4-5 years | |
| Other values (3) |
Length
| Max length | 11 |
|---|---|
| Median length | 9 |
| Mean length | 9.1086589 |
| Min length | 9 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1-2 years |
|---|---|
| 2nd row | 2-3 years |
| 3rd row | < 1 years |
| 4th row | 10-15 years |
| 5th row | 2-3 years |
Common Values
| Value | Count | Frequency (%) |
| < 1 years | 5149 | |
| 1-2 years | 3798 | |
| 2-3 years | 1840 | 9.3% |
| 3-4 years | 1080 | 5.5% |
| 4-5 years | 927 | 4.7% |
| 5-10 years | 869 | 4.4% |
| 10-15 years | 336 | 1.7% |
| 20+ years | 183 | 0.9% |
| (Missing) | 5535 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| years | 14182 | |
| 5149 | 15.4% | |
| 1 | 5149 | 15.4% |
| 1-2 | 3798 | 11.3% |
| 2-3 | 1840 | 5.5% |
| 3-4 | 1080 | 3.2% |
| 4-5 | 927 | 2.8% |
| 5-10 | 869 | 2.6% |
| 10-15 | 336 | 1.0% |
| 20 | 183 | 0.5% |
Most occurring characters
| Value | Count | Frequency (%) |
| 19331 | ||
| y | 14182 | |
| e | 14182 | |
| a | 14182 | |
| r | 14182 | |
| s | 14182 | |
| 1 | 10488 | |
| - | 8850 | |
| 2 | 5821 | 4.5% |
| < | 5149 | 4.0% |
| Other values (5) | 8630 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 129179 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 19331 | ||
| y | 14182 | |
| e | 14182 | |
| a | 14182 | |
| r | 14182 | |
| s | 14182 | |
| 1 | 10488 | |
| - | 8850 | |
| 2 | 5821 | 4.5% |
| < | 5149 | 4.0% |
| Other values (5) | 8630 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 129179 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 19331 | ||
| y | 14182 | |
| e | 14182 | |
| a | 14182 | |
| r | 14182 | |
| s | 14182 | |
| 1 | 10488 | |
| - | 8850 | |
| 2 | 5821 | 4.5% |
| < | 5149 | 4.0% |
| Other values (5) | 8630 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 129179 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 19331 | ||
| y | 14182 | |
| e | 14182 | |
| a | 14182 | |
| r | 14182 | |
| s | 14182 | |
| 1 | 10488 | |
| - | 8850 | |
| 2 | 5821 | 4.5% |
| < | 5149 | 4.0% |
| Other values (5) | 8630 |
| Distinct | 98 |
|---|---|
| Distinct (%) | 1.1% |
| Missing | 10491 |
| Missing (%) | 53.2% |
| Memory size | 154.2 KiB |
Length
| Max length | 485 |
|---|---|
| Median length | 364 |
| Mean length | 207.43822 |
| Min length | 5 |
Unique
| Unique | 13 ? |
|---|---|
| Unique (%) | 0.1% |
Sample
| 1st row | Analyze and understand data to influence product or business decisions, Build and/or run the data infrastructure that my business uses for storing, analyzing, and operationalizing data, Build prototypes to explore applying machine learning to new areas, Build and/or run a machine learning service that operationally improves my product or workflows |
|---|---|
| 2nd row | Build prototypes to explore applying machine learning to new areas, Do research that advances the state of the art of machine learning |
| 3rd row | Analyze and understand data to influence product or business decisions, Experimentation and iteration to improve existing ML models, Do research that advances the state of the art of machine learning |
| 4th row | Analyze and understand data to influence product or business decisions, Build prototypes to explore applying machine learning to new areas, Build and/or run a machine learning service that operationally improves my product or workflows |
| 5th row | Other |
| Value | Count | Frequency (%) |
| to | 19758 | 7.1% |
| and | 13362 | 4.8% |
| data | 13223 | 4.7% |
| build | 11895 | 4.3% |
| machine | 10688 | 3.8% |
| learning | 10688 | 3.8% |
| business | 9657 | 3.5% |
| product | 9439 | 3.4% |
| or | 9439 | 3.4% |
| that | 9273 | 3.3% |
| Other values (47) | 162326 |
Most occurring characters
| Value | Count | Frequency (%) |
| 270522 | ||
| n | 158935 | 8.3% |
| a | 154761 | 8.1% |
| e | 153868 | 8.0% |
| t | 132483 | 6.9% |
| i | 125681 | 6.6% |
| o | 122671 | 6.4% |
| r | 120312 | 6.3% |
| s | 97063 | 5.1% |
| d | 79170 | 4.1% |
| Other values (25) | 498359 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 1913825 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 270522 | ||
| n | 158935 | 8.3% |
| a | 154761 | 8.1% |
| e | 153868 | 8.0% |
| t | 132483 | 6.9% |
| i | 125681 | 6.6% |
| o | 122671 | 6.4% |
| r | 120312 | 6.3% |
| s | 97063 | 5.1% |
| d | 79170 | 4.1% |
| Other values (25) | 498359 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 1913825 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 270522 | ||
| n | 158935 | 8.3% |
| a | 154761 | 8.1% |
| e | 153868 | 8.0% |
| t | 132483 | 6.9% |
| i | 125681 | 6.6% |
| o | 122671 | 6.4% |
| r | 120312 | 6.3% |
| s | 97063 | 5.1% |
| d | 79170 | 4.1% |
| Other values (25) | 498359 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 1913825 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 270522 | ||
| n | 158935 | 8.3% |
| a | 154761 | 8.1% |
| e | 153868 | 8.0% |
| t | 132483 | 6.9% |
| i | 125681 | 6.6% |
| o | 122671 | 6.4% |
| r | 120312 | 6.3% |
| s | 97063 | 5.1% |
| d | 79170 | 4.1% |
| Other values (25) | 498359 |
| Distinct | 1020 |
|---|---|
| Distinct (%) | 6.1% |
| Missing | 2936 |
| Missing (%) | 14.9% |
| Memory size | 154.2 KiB |
Length
| Max length | 512 |
|---|---|
| Median length | 412 |
| Mean length | 153.9391 |
| Min length | 4 |
Unique
| Unique | 313 ? |
|---|---|
| Unique (%) | 1.9% |
Sample
| 1st row | Twitter (data science influencers), Kaggle (forums, blog, social media, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc) |
|---|---|
| 2nd row | Kaggle (forums, blog, social media, etc), YouTube (Cloud AI Adventures, Siraj Raval, etc), Podcasts (Chai Time Data Science, Linear Digressions, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc) |
| 3rd row | Podcasts (Chai Time Data Science, Linear Digressions, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc), Slack Communities (ods.ai, kagglenoobs, etc) |
| 4th row | YouTube (Cloud AI Adventures, Siraj Raval, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Other |
| 5th row | YouTube (Cloud AI Adventures, Siraj Raval, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc) |
| Value | Count | Frequency (%) |
| etc | 44329 | 14.0% |
| data | 15722 | 5.0% |
| science | 15722 | 5.0% |
| forums | 14506 | 4.6% |
| kaggle | 10751 | 3.4% |
| blog | 10751 | 3.4% |
| social | 10751 | 3.4% |
| media | 10751 | 3.4% |
| kdnuggets | 9907 | 3.1% |
| vidhya | 9907 | 3.1% |
| Other values (36) | 164129 |
Most occurring characters
| Value | Count | Frequency (%) |
| 300445 | 11.6% | |
| e | 194465 | 7.5% |
| a | 181119 | 7.0% |
| i | 150113 | 5.8% |
| , | 140412 | 5.4% |
| t | 138919 | 5.4% |
| s | 131026 | 5.1% |
| c | 129271 | 5.0% |
| o | 120553 | 4.7% |
| n | 104219 | 4.0% |
| Other values (39) | 992710 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 2583252 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 300445 | 11.6% | |
| e | 194465 | 7.5% |
| a | 181119 | 7.0% |
| i | 150113 | 5.8% |
| , | 140412 | 5.4% |
| t | 138919 | 5.4% |
| s | 131026 | 5.1% |
| c | 129271 | 5.0% |
| o | 120553 | 4.7% |
| n | 104219 | 4.0% |
| Other values (39) | 992710 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 2583252 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 300445 | 11.6% | |
| e | 194465 | 7.5% |
| a | 181119 | 7.0% |
| i | 150113 | 5.8% |
| , | 140412 | 5.4% |
| t | 138919 | 5.4% |
| s | 131026 | 5.1% |
| c | 129271 | 5.0% |
| o | 120553 | 4.7% |
| n | 104219 | 4.0% |
| Other values (39) | 992710 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 2583252 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 300445 | 11.6% | |
| e | 194465 | 7.5% |
| a | 181119 | 7.0% |
| i | 150113 | 5.8% |
| , | 140412 | 5.4% |
| t | 138919 | 5.4% |
| s | 131026 | 5.1% |
| c | 129271 | 5.0% |
| o | 120553 | 4.7% |
| n | 104219 | 4.0% |
| Other values (39) | 992710 |
| Distinct | 819 |
|---|---|
| Distinct (%) | 4.9% |
| Missing | 3148 |
| Missing (%) | 16.0% |
| Memory size | 154.2 KiB |
Length
| Max length | 176 |
|---|---|
| Median length | 151 |
| Mean length | 40.252942 |
| Min length | 3 |
Unique
| Unique | 262 ? |
|---|---|
| Unique (%) | 1.6% |
Sample
| 1st row | Coursera, DataCamp, Kaggle Courses (i.e. Kaggle Learn), Udemy |
|---|---|
| 2nd row | Coursera, DataCamp, Kaggle Courses (i.e. Kaggle Learn), Udemy |
| 3rd row | Coursera, edX, DataCamp, University Courses (resulting in a university degree) |
| 4th row | Other |
| 5th row | None |
| Value | Count | Frequency (%) |
| kaggle | 10238 | |
| courses | 9597 | |
| university | 8956 | 10.1% |
| coursera | 8685 | 9.8% |
| i.e | 5119 | 5.8% |
| learn | 5119 | 5.8% |
| udemy | 4804 | 5.4% |
| resulting | 4478 | 5.1% |
| in | 4478 | 5.1% |
| a | 4478 | 5.1% |
| Other values (10) | 22616 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 80252 | 12.0% |
| 71999 | 10.8% | |
| r | 53153 | 8.0% |
| a | 48811 | 7.3% |
| s | 43576 | 6.5% |
| i | 39026 | 5.9% |
| g | 30715 | 4.6% |
| n | 29654 | 4.4% |
| u | 27981 | 4.2% |
| t | 25108 | 3.8% |
| Other values (25) | 216676 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 666951 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 80252 | 12.0% |
| 71999 | 10.8% | |
| r | 53153 | 8.0% |
| a | 48811 | 7.3% |
| s | 43576 | 6.5% |
| i | 39026 | 5.9% |
| g | 30715 | 4.6% |
| n | 29654 | 4.4% |
| u | 27981 | 4.2% |
| t | 25108 | 3.8% |
| Other values (25) | 216676 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 666951 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 80252 | 12.0% |
| 71999 | 10.8% | |
| r | 53153 | 8.0% |
| a | 48811 | 7.3% |
| s | 43576 | 6.5% |
| i | 39026 | 5.9% |
| g | 30715 | 4.6% |
| n | 29654 | 4.4% |
| u | 27981 | 4.2% |
| t | 25108 | 3.8% |
| Other values (25) | 216676 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 666951 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 80252 | 12.0% |
| 71999 | 10.8% | |
| r | 53153 | 8.0% |
| a | 48811 | 7.3% |
| s | 43576 | 6.5% |
| i | 39026 | 5.9% |
| g | 30715 | 4.6% |
| n | 29654 | 4.4% |
| u | 27981 | 4.2% |
| t | 25108 | 3.8% |
| Other values (25) | 216676 |
which_of_the_following_integrated_development_environments_id_es_do_you_use_on_a_regular_basis
Text
Missing 
| Distinct | 853 |
|---|---|
| Distinct (%) | 5.8% |
| Missing | 5090 |
| Missing (%) | 25.8% |
| Memory size | 154.2 KiB |
Length
| Max length | 185 |
|---|---|
| Median length | 160 |
| Mean length | 64.706023 |
| Min length | 4 |
Unique
| Unique | 274 ? |
|---|---|
| Unique (%) | 1.9% |
Sample
| 1st row | Jupyter (JupyterLab, Jupyter Notebooks, etc) , RStudio , PyCharm , MATLAB , Spyder |
|---|---|
| 2nd row | Jupyter (JupyterLab, Jupyter Notebooks, etc) , Visual Studio / Visual Studio Code |
| 3rd row | Jupyter (JupyterLab, Jupyter Notebooks, etc) |
| 4th row | RStudio , Other |
| 5th row | Jupyter (JupyterLab, Jupyter Notebooks, etc) , Spyder , Notepad++ , Sublime Text |
| Value | Count | Frequency (%) |
| 30693 | ||
| jupyter | 21608 | |
| notebooks | 10804 | 8.0% |
| etc | 10804 | 8.0% |
| jupyterlab | 10804 | 8.0% |
| visual | 9068 | 6.7% |
| studio | 9068 | 6.7% |
| code | 4534 | 3.3% |
| rstudio | 4455 | 3.3% |
| pycharm | 4224 | 3.1% |
| Other values (10) | 19424 |
Most occurring characters
| Value | Count | Frequency (%) |
| 183434 | ||
| t | 75657 | 8.0% |
| e | 71159 | 7.5% |
| u | 57658 | 6.1% |
| o | 55475 | 5.9% |
| , | 45985 | 4.9% |
| r | 40412 | 4.3% |
| y | 39721 | 4.2% |
| p | 38778 | 4.1% |
| J | 32412 | 3.4% |
| Other values (29) | 305764 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 946455 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 183434 | ||
| t | 75657 | 8.0% |
| e | 71159 | 7.5% |
| u | 57658 | 6.1% |
| o | 55475 | 5.9% |
| , | 45985 | 4.9% |
| r | 40412 | 4.3% |
| y | 39721 | 4.2% |
| p | 38778 | 4.1% |
| J | 32412 | 3.4% |
| Other values (29) | 305764 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 946455 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 183434 | ||
| t | 75657 | 8.0% |
| e | 71159 | 7.5% |
| u | 57658 | 6.1% |
| o | 55475 | 5.9% |
| , | 45985 | 4.9% |
| r | 40412 | 4.3% |
| y | 39721 | 4.2% |
| p | 38778 | 4.1% |
| J | 32412 | 3.4% |
| Other values (29) | 305764 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 946455 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 183434 | ||
| t | 75657 | 8.0% |
| e | 71159 | 7.5% |
| u | 57658 | 6.1% |
| o | 55475 | 5.9% |
| , | 45985 | 4.9% |
| r | 40412 | 4.3% |
| y | 39721 | 4.2% |
| p | 38778 | 4.1% |
| J | 32412 | 3.4% |
| Other values (29) | 305764 |
| Distinct | 248 |
|---|---|
| Distinct (%) | 1.7% |
| Missing | 5274 |
| Missing (%) | 26.7% |
| Memory size | 154.2 KiB |
Length
| Max length | 295 |
|---|---|
| Median length | 254 |
| Mean length | 29.514851 |
| Min length | 4 |
Unique
| Unique | 100 ? |
|---|---|
| Unique (%) | 0.7% |
Sample
| 1st row | None |
|---|---|
| 2nd row | Microsoft Azure Notebooks |
| 3rd row | Google Colab , Google Cloud Notebook Products (AI Platform, Datalab, etc) |
| 4th row | None |
| 5th row | Kaggle Notebooks (Kernels) , Google Colab , Binder / JupyterHub |
| Value | Count | Frequency (%) |
| 7815 | ||
| notebooks | 7214 | |
| 5672 | 9.4% | |
| none | 5177 | 8.5% |
| kernels | 4845 | 8.0% |
| kaggle | 4845 | 8.0% |
| colab | 4551 | 7.5% |
| products | 1878 | 3.1% |
| etc | 1878 | 3.1% |
| notebook | 1878 | 3.1% |
| Other values (20) | 14831 |
Most occurring characters
| Value | Count | Frequency (%) |
| 68912 | ||
| o | 55693 | |
| e | 43123 | 10.1% |
| l | 23377 | 5.5% |
| t | 19566 | 4.6% |
| a | 16565 | 3.9% |
| b | 16546 | 3.9% |
| g | 16119 | 3.8% |
| s | 15603 | 3.7% |
| r | 14417 | 3.4% |
| Other values (34) | 136362 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 426283 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 68912 | ||
| o | 55693 | |
| e | 43123 | 10.1% |
| l | 23377 | 5.5% |
| t | 19566 | 4.6% |
| a | 16565 | 3.9% |
| b | 16546 | 3.9% |
| g | 16119 | 3.8% |
| s | 15603 | 3.7% |
| r | 14417 | 3.4% |
| Other values (34) | 136362 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 426283 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 68912 | ||
| o | 55693 | |
| e | 43123 | 10.1% |
| l | 23377 | 5.5% |
| t | 19566 | 4.6% |
| a | 16565 | 3.9% |
| b | 16546 | 3.9% |
| g | 16119 | 3.8% |
| s | 15603 | 3.7% |
| r | 14417 | 3.4% |
| Other values (34) | 136362 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 426283 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 68912 | ||
| o | 55693 | |
| e | 43123 | 10.1% |
| l | 23377 | 5.5% |
| t | 19566 | 4.6% |
| a | 16565 | 3.9% |
| b | 16546 | 3.9% |
| g | 16119 | 3.8% |
| s | 15603 | 3.7% |
| r | 14417 | 3.4% |
| Other values (34) | 136362 |
what_programming_languages_do_you_use_on_a_regular_basis
Text
Missing 
| Distinct | 611 |
|---|---|
| Distinct (%) | 4.2% |
| Missing | 5313 |
| Missing (%) | 26.9% |
| Memory size | 154.2 KiB |
Length
| Max length | 70 |
|---|---|
| Median length | 60 |
| Mean length | 14.848792 |
| Min length | 1 |
Unique
| Unique | 215 ? |
|---|---|
| Unique (%) | 1.5% |
Sample
| 1st row | Python, R, SQL, Java, Javascript, MATLAB |
|---|---|
| 2nd row | Python, R, SQL, Bash |
| 3rd row | Python, SQL |
| 4th row | Python, R |
| 5th row | Python, R, Bash |
| Value | Count | Frequency (%) |
| python | 12841 | |
| sql | 6532 | |
| r | 4588 | 12.2% |
| c | 3928 | 10.5% |
| java | 2267 | 6.0% |
| javascript | 2174 | 5.8% |
| bash | 2037 | 5.4% |
| matlab | 1516 | 4.0% |
| other | 1148 | 3.1% |
| typescript | 389 | 1.0% |
Most occurring characters
| Value | Count | Frequency (%) |
| , | 23099 | 10.8% |
| 23099 | 10.8% | |
| t | 16552 | 7.7% |
| h | 16026 | 7.5% |
| y | 13230 | 6.2% |
| o | 12924 | 6.0% |
| n | 12924 | 6.0% |
| P | 12841 | 6.0% |
| a | 10919 | 5.1% |
| L | 8048 | 3.8% |
| Other values (19) | 64220 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 213882 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| , | 23099 | 10.8% |
| 23099 | 10.8% | |
| t | 16552 | 7.7% |
| h | 16026 | 7.5% |
| y | 13230 | 6.2% |
| o | 12924 | 6.0% |
| n | 12924 | 6.0% |
| P | 12841 | 6.0% |
| a | 10919 | 5.1% |
| L | 8048 | 3.8% |
| Other values (19) | 64220 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 213882 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| , | 23099 | 10.8% |
| 23099 | 10.8% | |
| t | 16552 | 7.7% |
| h | 16026 | 7.5% |
| y | 13230 | 6.2% |
| o | 12924 | 6.0% |
| n | 12924 | 6.0% |
| P | 12841 | 6.0% |
| a | 10919 | 5.1% |
| L | 8048 | 3.8% |
| Other values (19) | 64220 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 213882 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| , | 23099 | 10.8% |
| 23099 | 10.8% | |
| t | 16552 | 7.7% |
| h | 16026 | 7.5% |
| y | 13230 | 6.2% |
| o | 12924 | 6.0% |
| n | 12924 | 6.0% |
| P | 12841 | 6.0% |
| a | 10919 | 5.1% |
| L | 8048 | 3.8% |
| Other values (19) | 64220 |
| Distinct | 439 |
|---|---|
| Distinct (%) | 3.1% |
| Missing | 5464 |
| Missing (%) | 27.7% |
| Memory size | 154.2 KiB |
Length
| Max length | 141 |
|---|---|
| Median length | 130 |
| Mean length | 30.0174 |
| Min length | 4 |
Unique
| Unique | 165 ? |
|---|---|
| Unique (%) | 1.2% |
Sample
| 1st row | Matplotlib |
|---|---|
| 2nd row | Ggplot / ggplot2 , Matplotlib , Seaborn |
| 3rd row | Matplotlib , Plotly / Plotly Express , Seaborn |
| 4th row | Ggplot / ggplot2 |
| 5th row | Matplotlib , Plotly / Plotly Express , Bokeh , Seaborn |
| Value | Count | Frequency (%) |
| 24947 | ||
| matplotlib | 10516 | |
| seaborn | 6905 | 10.3% |
| plotly | 6434 | 9.6% |
| ggplot | 4182 | 6.2% |
| ggplot2 | 4182 | 6.2% |
| express | 3217 | 4.8% |
| shiny | 1244 | 1.8% |
| none | 1240 | 1.8% |
| d3.js | 1078 | 1.6% |
| Other values (6) | 3419 | 5.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| 95205 | ||
| l | 44819 | |
| t | 37656 | 8.8% |
| o | 36340 | 8.5% |
| p | 22741 | 5.3% |
| a | 18138 | 4.2% |
| b | 18065 | 4.2% |
| , | 16998 | 4.0% |
| e | 14614 | 3.4% |
| i | 13121 | 3.1% |
| Other values (28) | 110141 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 427838 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 95205 | ||
| l | 44819 | |
| t | 37656 | 8.8% |
| o | 36340 | 8.5% |
| p | 22741 | 5.3% |
| a | 18138 | 4.2% |
| b | 18065 | 4.2% |
| , | 16998 | 4.0% |
| e | 14614 | 3.4% |
| i | 13121 | 3.1% |
| Other values (28) | 110141 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 427838 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 95205 | ||
| l | 44819 | |
| t | 37656 | 8.8% |
| o | 36340 | 8.5% |
| p | 22741 | 5.3% |
| a | 18138 | 4.2% |
| b | 18065 | 4.2% |
| , | 16998 | 4.0% |
| e | 14614 | 3.4% |
| i | 13121 | 3.1% |
| Other values (28) | 110141 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 427838 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 95205 | ||
| l | 44819 | |
| t | 37656 | 8.8% |
| o | 36340 | 8.5% |
| p | 22741 | 5.3% |
| a | 18138 | 4.2% |
| b | 18065 | 4.2% |
| , | 16998 | 4.0% |
| e | 14614 | 3.4% |
| i | 13121 | 3.1% |
| Other values (28) | 110141 |
which_types_of_specialized_hardware_do_you_use_on_a_regular_basis
Categorical
Missing 
| Distinct | 14 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 5499 |
| Missing (%) | 27.9% |
| Memory size | 154.2 KiB |
| CPUs, GPUs | |
|---|---|
| CPUs | |
| None / I do not know | |
| GPUs | |
| CPUs, GPUs, TPUs | 348 |
| Other values (9) | 250 |
Length
| Max length | 23 |
|---|---|
| Median length | 20 |
| Mean length | 9.2723308 |
| Min length | 4 |
Unique
| Unique | 1 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | CPUs, GPUs |
|---|---|
| 2nd row | CPUs, GPUs |
| 3rd row | CPUs, GPUs |
| 4th row | CPUs, GPUs |
| 5th row | CPUs, GPUs |
Common Values
| Value | Count | Frequency (%) |
| CPUs, GPUs | 5041 | |
| CPUs | 5001 | |
| None / I do not know | 2449 | |
| GPUs | 1129 | 5.7% |
| CPUs, GPUs, TPUs | 348 | 1.8% |
| GPUs, TPUs | 82 | 0.4% |
| Other | 50 | 0.3% |
| TPUs | 30 | 0.2% |
| CPUs, TPUs | 30 | 0.2% |
| CPUs, GPUs, Other | 27 | 0.1% |
| Other values (4) | 31 | 0.2% |
| (Missing) | 5499 |
Length
| Value | Count | Frequency (%) |
| cpus | 10472 | |
| gpus | 6638 | |
| none | 2449 | 7.6% |
| 2449 | 7.6% | |
| i | 2449 | 7.6% |
| do | 2449 | 7.6% |
| not | 2449 | 7.6% |
| know | 2449 | 7.6% |
| tpus | 496 | 1.5% |
| other | 108 | 0.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| 18190 | ||
| P | 17606 | |
| U | 17606 | |
| s | 17606 | |
| C | 10472 | |
| o | 9796 | |
| n | 7347 | |
| G | 6638 | 5.0% |
| , | 5945 | 4.5% |
| t | 2557 | 1.9% |
| Other values (11) | 18071 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 131834 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 18190 | ||
| P | 17606 | |
| U | 17606 | |
| s | 17606 | |
| C | 10472 | |
| o | 9796 | |
| n | 7347 | |
| G | 6638 | 5.0% |
| , | 5945 | 4.5% |
| t | 2557 | 1.9% |
| Other values (11) | 18071 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 131834 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 18190 | ||
| P | 17606 | |
| U | 17606 | |
| s | 17606 | |
| C | 10472 | |
| o | 9796 | |
| n | 7347 | |
| G | 6638 | 5.0% |
| , | 5945 | 4.5% |
| t | 2557 | 1.9% |
| Other values (11) | 18071 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 131834 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 18190 | ||
| P | 17606 | |
| U | 17606 | |
| s | 17606 | |
| C | 10472 | |
| o | 9796 | |
| n | 7347 | |
| G | 6638 | 5.0% |
| , | 5945 | 4.5% |
| t | 2557 | 1.9% |
| Other values (11) | 18071 |
| Distinct | 684 |
|---|---|
| Distinct (%) | 4.9% |
| Missing | 5629 |
| Missing (%) | 28.5% |
| Memory size | 154.2 KiB |
Length
| Max length | 336 |
|---|---|
| Median length | 288 |
| Mean length | 101.29813 |
| Min length | 4 |
Unique
| Unique | 232 ? |
|---|---|
| Unique (%) | 1.6% |
Sample
| 1st row | Linear or Logistic Regression |
|---|---|
| 2nd row | Linear or Logistic Regression, Convolutional Neural Networks |
| 3rd row | Linear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc) |
| 4th row | Linear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc), Bayesian Approaches, Convolutional Neural Networks, Generative Adversarial Networks, Recurrent Neural Networks |
| 5th row | Linear or Logistic Regression, Dense Neural Networks (MLPs, etc), Convolutional Neural Networks, Recurrent Neural Networks |
| Value | Count | Frequency (%) |
| or | 18713 | 10.5% |
| networks | 14046 | 7.9% |
| neural | 12162 | 6.8% |
| linear | 10223 | 5.7% |
| logistic | 10223 | 5.7% |
| regression | 10223 | 5.7% |
| etc | 9769 | 5.5% |
| decision | 8490 | 4.8% |
| trees | 8490 | 4.8% |
| random | 8490 | 4.8% |
| Other values (20) | 67134 |
Most occurring characters
| Value | Count | Frequency (%) |
| 163875 | 11.5% | |
| e | 139846 | 9.8% |
| o | 125597 | 8.8% |
| s | 112138 | 7.9% |
| r | 106194 | 7.4% |
| i | 92023 | 6.4% |
| n | 79316 | 5.6% |
| t | 76694 | 5.4% |
| a | 64148 | 4.5% |
| , | 46623 | 3.3% |
| Other values (33) | 420634 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 1427088 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 163875 | 11.5% | |
| e | 139846 | 9.8% |
| o | 125597 | 8.8% |
| s | 112138 | 7.9% |
| r | 106194 | 7.4% |
| i | 92023 | 6.4% |
| n | 79316 | 5.6% |
| t | 76694 | 5.4% |
| a | 64148 | 4.5% |
| , | 46623 | 3.3% |
| Other values (33) | 420634 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 1427088 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 163875 | 11.5% | |
| e | 139846 | 9.8% |
| o | 125597 | 8.8% |
| s | 112138 | 7.9% |
| r | 106194 | 7.4% |
| i | 92023 | 6.4% |
| n | 79316 | 5.6% |
| t | 76694 | 5.4% |
| a | 64148 | 4.5% |
| , | 46623 | 3.3% |
| Other values (33) | 420634 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 1427088 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 163875 | 11.5% | |
| e | 139846 | 9.8% |
| o | 125597 | 8.8% |
| s | 112138 | 7.9% |
| r | 106194 | 7.4% |
| i | 92023 | 6.4% |
| n | 79316 | 5.6% |
| t | 76694 | 5.4% |
| a | 64148 | 4.5% |
| , | 46623 | 3.3% |
| Other values (33) | 420634 |
which_categories_of_ml_tools_do_you_use_on_a_regular_basis
Text
Missing 
| Distinct | 92 |
|---|---|
| Distinct (%) | 0.7% |
| Missing | 5802 |
| Missing (%) | 29.4% |
| Memory size | 154.2 KiB |
Length
| Max length | 374 |
|---|---|
| Median length | 4 |
| Mean length | 44.514625 |
| Min length | 4 |
Unique
| Unique | 15 ? |
|---|---|
| Unique (%) | 0.1% |
Sample
| 1st row | None |
|---|---|
| 2nd row | Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI) |
| 3rd row | None |
| 4th row | Automated model selection (e.g. auto-sklearn, xcessiv), Automated hyperparameter tuning (e.g. hyperopt, ray.tune), Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI) |
| 5th row | Automated data augmentation (e.g. imgaug, albumentations), Automated feature engineering/selection (e.g. tpot, boruta_py), Automated model selection (e.g. auto-sklearn, xcessiv), Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI) |
| Value | Count | Frequency (%) |
| e.g | 9911 | 13.4% |
| automated | 8733 | 11.8% |
| none | 7822 | 10.6% |
| model | 3650 | 4.9% |
| selection | 3200 | 4.3% |
| auto-sklearn | 3200 | 4.3% |
| xcessiv | 3200 | 4.3% |
| data | 1800 | 2.4% |
| augmentation | 1800 | 2.4% |
| imgaug | 1800 | 2.4% |
| Other values (24) | 28741 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 74310 | 12.0% |
| 59942 | 9.7% | |
| t | 52616 | 8.5% |
| o | 43566 | 7.0% |
| a | 39055 | 6.3% |
| n | 35582 | 5.7% |
| u | 27883 | 4.5% |
| i | 23255 | 3.8% |
| . | 21600 | 3.5% |
| s | 21439 | 3.5% |
| Other values (31) | 220173 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 619421 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 74310 | 12.0% |
| 59942 | 9.7% | |
| t | 52616 | 8.5% |
| o | 43566 | 7.0% |
| a | 39055 | 6.3% |
| n | 35582 | 5.7% |
| u | 27883 | 4.5% |
| i | 23255 | 3.8% |
| . | 21600 | 3.5% |
| s | 21439 | 3.5% |
| Other values (31) | 220173 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 619421 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 74310 | 12.0% |
| 59942 | 9.7% | |
| t | 52616 | 8.5% |
| o | 43566 | 7.0% |
| a | 39055 | 6.3% |
| n | 35582 | 5.7% |
| u | 27883 | 4.5% |
| i | 23255 | 3.8% |
| . | 21600 | 3.5% |
| s | 21439 | 3.5% |
| Other values (31) | 220173 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 619421 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 74310 | 12.0% |
| 59942 | 9.7% | |
| t | 52616 | 8.5% |
| o | 43566 | 7.0% |
| a | 39055 | 6.3% |
| n | 35582 | 5.7% |
| u | 27883 | 4.5% |
| i | 23255 | 3.8% |
| . | 21600 | 3.5% |
| s | 21439 | 3.5% |
| Other values (31) | 220173 |
which_categories_of_computer_vision_methods_do_you_use_on_a_regular_basis
Categorical
Missing 
| Distinct | 49 |
|---|---|
| Distinct (%) | 0.9% |
| Missing | 14225 |
| Missing (%) | 72.1% |
| Memory size | 154.2 KiB |
| None | |
|---|---|
| Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc) | |
| General purpose image/video tools (PIL, cv2, skimage, etc), Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc) | |
| General purpose image/video tools (PIL, cv2, skimage, etc), Image segmentation methods (U-Net, Mask R-CNN, etc), Object detection methods (YOLOv3, RetinaNet, etc), Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc), Generative Networks (GAN, VAE, etc) | |
| General purpose image/video tools (PIL, cv2, skimage, etc), Image segmentation methods (U-Net, Mask R-CNN, etc), Object detection methods (YOLOv3, RetinaNet, etc), Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc) | |
| Other values (44) |
Length
| Max length | 324 |
|---|---|
| Median length | 271 |
| Mean length | 136.52203 |
| Min length | 4 |
Unique
| Unique | 9 ? |
|---|---|
| Unique (%) | 0.2% |
Sample
| 1st row | General purpose image/video tools (PIL, cv2, skimage, etc), Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc) |
|---|---|
| 2nd row | None |
| 3rd row | General purpose image/video tools (PIL, cv2, skimage, etc), Image segmentation methods (U-Net, Mask R-CNN, etc), Object detection methods (YOLOv3, RetinaNet, etc) |
| 4th row | General purpose image/video tools (PIL, cv2, skimage, etc), Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc) |
| 5th row | General purpose image/video tools (PIL, cv2, skimage, etc), Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc) |
Common Values
| Value | Count | Frequency (%) |
| None | 1203 | 6.1% |
| Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc) | 560 | 2.8% |
| General purpose image/video tools (PIL, cv2, skimage, etc), Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc) | 366 | 1.9% |
| General purpose image/video tools (PIL, cv2, skimage, etc), Image segmentation methods (U-Net, Mask R-CNN, etc), Object detection methods (YOLOv3, RetinaNet, etc), Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc), Generative Networks (GAN, VAE, etc) | 341 | 1.7% |
| General purpose image/video tools (PIL, cv2, skimage, etc), Image segmentation methods (U-Net, Mask R-CNN, etc), Object detection methods (YOLOv3, RetinaNet, etc), Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc) | 326 | 1.7% |
| General purpose image/video tools (PIL, cv2, skimage, etc), Image segmentation methods (U-Net, Mask R-CNN, etc), Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc) | 243 | 1.2% |
| Image segmentation methods (U-Net, Mask R-CNN, etc), Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc) | 237 | 1.2% |
| General purpose image/video tools (PIL, cv2, skimage, etc) | 233 | 1.2% |
| Image segmentation methods (U-Net, Mask R-CNN, etc), Object detection methods (YOLOv3, RetinaNet, etc), Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc) | 229 | 1.2% |
| General purpose image/video tools (PIL, cv2, skimage, etc), Object detection methods (YOLOv3, RetinaNet, etc), Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc) | 224 | 1.1% |
| Other values (39) | 1530 | 7.8% |
| (Missing) | 14225 |
Length
| Value | Count | Frequency (%) |
| etc | 10408 | 11.0% |
| general | 5394 | 5.7% |
| purpose | 5394 | 5.7% |
| image | 5248 | 5.5% |
| networks | 4268 | 4.5% |
| methods | 3933 | 4.2% |
| other | 3238 | 3.4% |
| and | 3187 | 3.4% |
| classification | 3187 | 3.4% |
| inception | 3187 | 3.4% |
| Other values (22) | 47148 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 95383 | 12.7% |
| 89100 | 11.9% | |
| t | 62987 | 8.4% |
| , | 41941 | 5.6% |
| o | 34913 | 4.7% |
| s | 34879 | 4.7% |
| n | 34666 | 4.6% |
| i | 32629 | 4.4% |
| a | 31692 | 4.2% |
| c | 29107 | 3.9% |
| Other values (36) | 262482 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 749779 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 95383 | 12.7% |
| 89100 | 11.9% | |
| t | 62987 | 8.4% |
| , | 41941 | 5.6% |
| o | 34913 | 4.7% |
| s | 34879 | 4.7% |
| n | 34666 | 4.6% |
| i | 32629 | 4.4% |
| a | 31692 | 4.2% |
| c | 29107 | 3.9% |
| Other values (36) | 262482 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 749779 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 95383 | 12.7% |
| 89100 | 11.9% | |
| t | 62987 | 8.4% |
| , | 41941 | 5.6% |
| o | 34913 | 4.7% |
| s | 34879 | 4.7% |
| n | 34666 | 4.6% |
| i | 32629 | 4.4% |
| a | 31692 | 4.2% |
| c | 29107 | 3.9% |
| Other values (36) | 262482 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 749779 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 95383 | 12.7% |
| 89100 | 11.9% | |
| t | 62987 | 8.4% |
| , | 41941 | 5.6% |
| o | 34913 | 4.7% |
| s | 34879 | 4.7% |
| n | 34666 | 4.6% |
| i | 32629 | 4.4% |
| a | 31692 | 4.2% |
| c | 29107 | 3.9% |
| Other values (36) | 262482 |
which_of_the_following_natural_language_processing_nlp_methods_do_you_use_on_a_regular_basis
Categorical
Missing 
| Distinct | 28 |
|---|---|
| Distinct (%) | 0.8% |
| Missing | 16135 |
| Missing (%) | 81.8% |
| Memory size | 154.2 KiB |
| None | |
|---|---|
| Word embeddings/vectors (GLoVe, fastText, word2vec) | |
| Word embeddings/vectors (GLoVe, fastText, word2vec), Encoder-decorder models (seq2seq, vanilla transformers) | |
| Word embeddings/vectors (GLoVe, fastText, word2vec), Encoder-decorder models (seq2seq, vanilla transformers), Contextualized embeddings (ELMo, CoVe), Transformer language models (GPT-2, BERT, XLnet, etc) | |
| Word embeddings/vectors (GLoVe, fastText, word2vec), Encoder-decorder models (seq2seq, vanilla transformers), Transformer language models (GPT-2, BERT, XLnet, etc) | |
| Other values (23) |
Length
| Max length | 210 |
|---|---|
| Median length | 170 |
| Mean length | 74.985204 |
| Min length | 4 |
Unique
| Unique | 5 ? |
|---|---|
| Unique (%) | 0.1% |
Sample
| 1st row | Word embeddings/vectors (GLoVe, fastText, word2vec), Encoder-decorder models (seq2seq, vanilla transformers) |
|---|---|
| 2nd row | Word embeddings/vectors (GLoVe, fastText, word2vec), Encoder-decorder models (seq2seq, vanilla transformers) |
| 3rd row | Word embeddings/vectors (GLoVe, fastText, word2vec) |
| 4th row | Word embeddings/vectors (GLoVe, fastText, word2vec), Encoder-decorder models (seq2seq, vanilla transformers), Contextualized embeddings (ELMo, CoVe), Transformer language models (GPT-2, BERT, XLnet, etc) |
| 5th row | None |
Common Values
| Value | Count | Frequency (%) |
| None | 1027 | 5.2% |
| Word embeddings/vectors (GLoVe, fastText, word2vec) | 616 | 3.1% |
| Word embeddings/vectors (GLoVe, fastText, word2vec), Encoder-decorder models (seq2seq, vanilla transformers) | 498 | 2.5% |
| Word embeddings/vectors (GLoVe, fastText, word2vec), Encoder-decorder models (seq2seq, vanilla transformers), Contextualized embeddings (ELMo, CoVe), Transformer language models (GPT-2, BERT, XLnet, etc) | 268 | 1.4% |
| Word embeddings/vectors (GLoVe, fastText, word2vec), Encoder-decorder models (seq2seq, vanilla transformers), Transformer language models (GPT-2, BERT, XLnet, etc) | 250 | 1.3% |
| Word embeddings/vectors (GLoVe, fastText, word2vec), Transformer language models (GPT-2, BERT, XLnet, etc) | 230 | 1.2% |
| Encoder-decorder models (seq2seq, vanilla transformers) | 188 | 1.0% |
| Transformer language models (GPT-2, BERT, XLnet, etc) | 115 | 0.6% |
| Word embeddings/vectors (GLoVe, fastText, word2vec), Contextualized embeddings (ELMo, CoVe), Transformer language models (GPT-2, BERT, XLnet, etc) | 79 | 0.4% |
| Word embeddings/vectors (GLoVe, fastText, word2vec), Encoder-decorder models (seq2seq, vanilla transformers), Contextualized embeddings (ELMo, CoVe) | 76 | 0.4% |
| Other values (18) | 235 | 1.2% |
| (Missing) | 16135 |
Length
| Value | Count | Frequency (%) |
| models | 2399 | 8.6% |
| embeddings/vectors | 2115 | 7.6% |
| glove | 2115 | 7.6% |
| fasttext | 2115 | 7.6% |
| word2vec | 2115 | 7.6% |
| word | 2115 | 7.6% |
| seq2seq | 1368 | 4.9% |
| encoder-decorder | 1368 | 4.9% |
| vanilla | 1368 | 4.9% |
| transformers | 1368 | 4.9% |
| Other values (12) | 9510 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 31307 | 11.7% |
| 24374 | 9.1% | |
| o | 18707 | 7.0% |
| r | 17695 | 6.6% |
| d | 16649 | 6.2% |
| s | 15809 | 5.9% |
| , | 11823 | 4.4% |
| n | 11463 | 4.3% |
| t | 10948 | 4.1% |
| a | 9874 | 3.7% |
| Other values (33) | 99948 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 268597 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 31307 | 11.7% |
| 24374 | 9.1% | |
| o | 18707 | 7.0% |
| r | 17695 | 6.6% |
| d | 16649 | 6.2% |
| s | 15809 | 5.9% |
| , | 11823 | 4.4% |
| n | 11463 | 4.3% |
| t | 10948 | 4.1% |
| a | 9874 | 3.7% |
| Other values (33) | 99948 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 268597 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 31307 | 11.7% |
| 24374 | 9.1% | |
| o | 18707 | 7.0% |
| r | 17695 | 6.6% |
| d | 16649 | 6.2% |
| s | 15809 | 5.9% |
| , | 11823 | 4.4% |
| n | 11463 | 4.3% |
| t | 10948 | 4.1% |
| a | 9874 | 3.7% |
| Other values (33) | 99948 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 268597 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 31307 | 11.7% |
| 24374 | 9.1% | |
| o | 18707 | 7.0% |
| r | 17695 | 6.6% |
| d | 16649 | 6.2% |
| s | 15809 | 5.9% |
| , | 11823 | 4.4% |
| n | 11463 | 4.3% |
| t | 10948 | 4.1% |
| a | 9874 | 3.7% |
| Other values (33) | 99948 |
| Distinct | 584 |
|---|---|
| Distinct (%) | 4.2% |
| Missing | 5964 |
| Missing (%) | 30.2% |
| Memory size | 154.2 KiB |
Length
| Max length | 129 |
|---|---|
| Median length | 108 |
| Mean length | 36.025522 |
| Min length | 4 |
Unique
| Unique | 202 ? |
|---|---|
| Unique (%) | 1.5% |
Sample
| 1st row | None |
|---|---|
| 2nd row | Scikit-learn , TensorFlow , Keras , RandomForest |
| 3rd row | Scikit-learn , RandomForest, Xgboost , LightGBM |
| 4th row | Scikit-learn , TensorFlow , Keras , RandomForest, Xgboost , Caret |
| 5th row | Scikit-learn , TensorFlow , Keras , PyTorch |
| Value | Count | Frequency (%) |
| 23108 | ||
| scikit-learn | 9390 | |
| tensorflow | 5822 | 9.0% |
| keras | 5756 | 8.9% |
| randomforest | 4524 | 7.0% |
| xgboost | 4243 | 6.6% |
| pytorch | 3412 | 5.3% |
| lightgbm | 2166 | 3.4% |
| none | 1720 | 2.7% |
| caret | 1139 | 1.8% |
| Other values (4) | 3111 | 4.8% |
Most occurring characters
| Value | Count | Frequency (%) |
| 114840 | ||
| o | 34310 | 6.9% |
| r | 31295 | 6.3% |
| e | 28693 | 5.8% |
| , | 26620 | 5.4% |
| a | 23617 | 4.8% |
| i | 22805 | 4.6% |
| t | 22753 | 4.6% |
| n | 21456 | 4.3% |
| s | 21294 | 4.3% |
| Other values (27) | 147776 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 495459 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 114840 | ||
| o | 34310 | 6.9% |
| r | 31295 | 6.3% |
| e | 28693 | 5.8% |
| , | 26620 | 5.4% |
| a | 23617 | 4.8% |
| i | 22805 | 4.6% |
| t | 22753 | 4.6% |
| n | 21456 | 4.3% |
| s | 21294 | 4.3% |
| Other values (27) | 147776 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 495459 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 114840 | ||
| o | 34310 | 6.9% |
| r | 31295 | 6.3% |
| e | 28693 | 5.8% |
| , | 26620 | 5.4% |
| a | 23617 | 4.8% |
| i | 22805 | 4.6% |
| t | 22753 | 4.6% |
| n | 21456 | 4.3% |
| s | 21294 | 4.3% |
| Other values (27) | 147776 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 495459 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 114840 | ||
| o | 34310 | 6.9% |
| r | 31295 | 6.3% |
| e | 28693 | 5.8% |
| , | 26620 | 5.4% |
| a | 23617 | 4.8% |
| i | 22805 | 4.6% |
| t | 22753 | 4.6% |
| n | 21456 | 4.3% |
| s | 21294 | 4.3% |
| Other values (27) | 147776 |
| Distinct | 183 |
|---|---|
| Distinct (%) | 2.6% |
| Missing | 12592 |
| Missing (%) | 63.9% |
| Memory size | 154.2 KiB |
Length
| Max length | 189 |
|---|---|
| Median length | 170 |
| Mean length | 26.534316 |
| Min length | 4 |
Unique
| Unique | 82 ? |
|---|---|
| Unique (%) | 1.2% |
Sample
| 1st row | Microsoft Azure |
|---|---|
| 2nd row | Amazon Web Services (AWS) |
| 3rd row | Google Cloud Platform (GCP) , Amazon Web Services (AWS) , Microsoft Azure |
| 4th row | None |
| 5th row | Google Cloud Platform (GCP) |
| Value | Count | Frequency (%) |
| cloud | 3233 | |
| web | 2758 | |
| services | 2758 | |
| aws | 2758 | |
| amazon | 2758 | |
| 2621 | ||
| none | 2229 | |
| 2134 | ||
| platform | 2134 | |
| gcp | 2134 | |
| Other values (11) | 4059 |
Most occurring characters
| Value | Count | Frequency (%) |
| 34524 | ||
| o | 17449 | 9.2% |
| e | 14804 | 7.8% |
| r | 8222 | 4.3% |
| l | 7888 | 4.2% |
| A | 7072 | 3.7% |
| S | 5721 | 3.0% |
| a | 5638 | 3.0% |
| W | 5516 | 2.9% |
| C | 5367 | 2.8% |
| Other values (28) | 76856 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 189057 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 34524 | ||
| o | 17449 | 9.2% |
| e | 14804 | 7.8% |
| r | 8222 | 4.3% |
| l | 7888 | 4.2% |
| A | 7072 | 3.7% |
| S | 5721 | 3.0% |
| a | 5638 | 3.0% |
| W | 5516 | 2.9% |
| C | 5367 | 2.8% |
| Other values (28) | 76856 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 189057 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 34524 | ||
| o | 17449 | 9.2% |
| e | 14804 | 7.8% |
| r | 8222 | 4.3% |
| l | 7888 | 4.2% |
| A | 7072 | 3.7% |
| S | 5721 | 3.0% |
| a | 5638 | 3.0% |
| W | 5516 | 2.9% |
| C | 5367 | 2.8% |
| Other values (28) | 76856 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 189057 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 34524 | ||
| o | 17449 | 9.2% |
| e | 14804 | 7.8% |
| r | 8222 | 4.3% |
| l | 7888 | 4.2% |
| A | 7072 | 3.7% |
| S | 5721 | 3.0% |
| a | 5638 | 3.0% |
| W | 5516 | 2.9% |
| C | 5367 | 2.8% |
| Other values (28) | 76856 |
| Distinct | 336 |
|---|---|
| Distinct (%) | 4.7% |
| Missing | 12617 |
| Missing (%) | 64.0% |
| Memory size | 154.2 KiB |
Length
| Max length | 231 |
|---|---|
| Median length | 224 |
| Mean length | 27.010986 |
| Min length | 4 |
Unique
| Unique | 155 ? |
|---|---|
| Unique (%) | 2.2% |
Sample
| 1st row | Azure Virtual Machines, Azure Container Service |
|---|---|
| 2nd row | AWS Elastic Compute Cloud (EC2) |
| 3rd row | Google Compute Engine (GCE), AWS Lambda, Azure Virtual Machines |
| 4th row | None |
| 5th row | AWS Elastic Compute Cloud (EC2) |
| Value | Count | Frequency (%) |
| aws | 3281 | |
| none | 3155 | |
| 2963 | ||
| compute | 2948 | |
| cloud | 2512 | |
| engine | 2261 | 7.7% |
| elastic | 2121 | 7.2% |
| ec2 | 1810 | 6.2% |
| azure | 1233 | 4.2% |
| gce | 1138 | 3.9% |
| Other values (11) | 6001 |
Most occurring characters
| Value | Count | Frequency (%) |
| 22323 | 11.6% | |
| e | 16711 | 8.7% |
| o | 15638 | 8.2% |
| n | 11546 | 6.0% |
| C | 8803 | 4.6% |
| u | 8759 | 4.6% |
| l | 8745 | 4.6% |
| t | 8364 | 4.4% |
| i | 7550 | 3.9% |
| E | 7330 | 3.8% |
| Other values (29) | 76009 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 191778 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 22323 | 11.6% | |
| e | 16711 | 8.7% |
| o | 15638 | 8.2% |
| n | 11546 | 6.0% |
| C | 8803 | 4.6% |
| u | 8759 | 4.6% |
| l | 8745 | 4.6% |
| t | 8364 | 4.4% |
| i | 7550 | 3.9% |
| E | 7330 | 3.8% |
| Other values (29) | 76009 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 191778 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 22323 | 11.6% | |
| e | 16711 | 8.7% |
| o | 15638 | 8.2% |
| n | 11546 | 6.0% |
| C | 8803 | 4.6% |
| u | 8759 | 4.6% |
| l | 8745 | 4.6% |
| t | 8364 | 4.4% |
| i | 7550 | 3.9% |
| E | 7330 | 3.8% |
| Other values (29) | 76009 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 191778 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 22323 | 11.6% | |
| e | 16711 | 8.7% |
| o | 15638 | 8.2% |
| n | 11546 | 6.0% |
| C | 8803 | 4.6% |
| u | 8759 | 4.6% |
| l | 8745 | 4.6% |
| t | 8364 | 4.4% |
| i | 7550 | 3.9% |
| E | 7330 | 3.8% |
| Other values (29) | 76009 |
| Distinct | 287 |
|---|---|
| Distinct (%) | 4.1% |
| Missing | 12639 |
| Missing (%) | 64.1% |
| Memory size | 154.2 KiB |
Length
| Max length | 173 |
|---|---|
| Median length | 4 |
| Mean length | 13.882311 |
| Min length | 4 |
Unique
| Unique | 143 ? |
|---|---|
| Unique (%) | 2.0% |
Sample
| 1st row | Databricks, Microsoft Analysis Services |
|---|---|
| 2nd row | AWS Elastic MapReduce |
| 3rd row | Google BigQuery, Databricks |
| 4th row | None |
| 5th row | Google Cloud Dataflow |
| Value | Count | Frequency (%) |
| none | 4133 | |
| 1881 | ||
| aws | 1641 | 10.9% |
| bigquery | 958 | 6.4% |
| cloud | 923 | 6.2% |
| databricks | 604 | 4.0% |
| redshift | 562 | 3.7% |
| dataflow | 525 | 3.5% |
| elastic | 429 | 2.9% |
| mapreduce | 429 | 2.9% |
| Other values (8) | 2914 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 10482 | 10.7% |
| o | 10195 | 10.4% |
| 7921 | 8.1% | |
| n | 5209 | 5.3% |
| a | 4877 | 5.0% |
| i | 4393 | 4.5% |
| l | 4184 | 4.3% |
| N | 4133 | 4.2% |
| s | 3861 | 3.9% |
| t | 3503 | 3.6% |
| Other values (30) | 39501 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 98259 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 10482 | 10.7% |
| o | 10195 | 10.4% |
| 7921 | 8.1% | |
| n | 5209 | 5.3% |
| a | 4877 | 5.0% |
| i | 4393 | 4.5% |
| l | 4184 | 4.3% |
| N | 4133 | 4.2% |
| s | 3861 | 3.9% |
| t | 3503 | 3.6% |
| Other values (30) | 39501 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 98259 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 10482 | 10.7% |
| o | 10195 | 10.4% |
| 7921 | 8.1% | |
| n | 5209 | 5.3% |
| a | 4877 | 5.0% |
| i | 4393 | 4.5% |
| l | 4184 | 4.3% |
| N | 4133 | 4.2% |
| s | 3861 | 3.9% |
| t | 3503 | 3.6% |
| Other values (30) | 39501 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 98259 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 10482 | 10.7% |
| o | 10195 | 10.4% |
| 7921 | 8.1% | |
| n | 5209 | 5.3% |
| a | 4877 | 5.0% |
| i | 4393 | 4.5% |
| l | 4184 | 4.3% |
| N | 4133 | 4.2% |
| s | 3861 | 3.9% |
| t | 3503 | 3.6% |
| Other values (30) | 39501 |
| Distinct | 272 |
|---|---|
| Distinct (%) | 3.9% |
| Missing | 12667 |
| Missing (%) | 64.2% |
| Memory size | 154.2 KiB |
Length
| Max length | 219 |
|---|---|
| Median length | 4 |
| Mean length | 16.253617 |
| Min length | 3 |
Unique
| Unique | 126 ? |
|---|---|
| Unique (%) | 1.8% |
Sample
| 1st row | Azure Machine Learning Studio |
|---|---|
| 2nd row | RapidMiner |
| 3rd row | SAS, Azure Machine Learning Studio, Google Cloud Machine Learning Engine |
| 4th row | None |
| 5th row | Google Cloud Translation |
| Value | Count | Frequency (%) |
| none | 4313 | |
| cloud | 2111 | |
| 2111 | ||
| machine | 1167 | 6.8% |
| learning | 1167 | 6.8% |
| engine | 586 | 3.4% |
| azure | 581 | 3.4% |
| studio | 581 | 3.4% |
| amazon | 569 | 3.3% |
| sagemaker | 569 | 3.3% |
| Other values (9) | 3283 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 13685 | 11.9% |
| o | 13339 | 11.6% |
| n | 11222 | 9.8% |
| 9988 | 8.7% | |
| a | 6954 | 6.1% |
| l | 5355 | 4.7% |
| g | 5233 | 4.6% |
| i | 5090 | 4.4% |
| N | 4713 | 4.1% |
| u | 4491 | 3.9% |
| Other values (24) | 34518 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 114588 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 13685 | 11.9% |
| o | 13339 | 11.6% |
| n | 11222 | 9.8% |
| 9988 | 8.7% | |
| a | 6954 | 6.1% |
| l | 5355 | 4.7% |
| g | 5233 | 4.6% |
| i | 5090 | 4.4% |
| N | 4713 | 4.1% |
| u | 4491 | 3.9% |
| Other values (24) | 34518 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 114588 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 13685 | 11.9% |
| o | 13339 | 11.6% |
| n | 11222 | 9.8% |
| 9988 | 8.7% | |
| a | 6954 | 6.1% |
| l | 5355 | 4.7% |
| g | 5233 | 4.6% |
| i | 5090 | 4.4% |
| N | 4713 | 4.1% |
| u | 4491 | 3.9% |
| Other values (24) | 34518 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 114588 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 13685 | 11.9% |
| o | 13339 | 11.6% |
| n | 11222 | 9.8% |
| 9988 | 8.7% | |
| a | 6954 | 6.1% |
| l | 5355 | 4.7% |
| g | 5233 | 4.6% |
| i | 5090 | 4.4% |
| N | 4713 | 4.1% |
| u | 4491 | 3.9% |
| Other values (24) | 34518 |
which_automated_machine_learning_tools_or_partial_auto_ml_tools_do_you_use_on_a_regular_basis
Text
Missing 
| Distinct | 201 |
|---|---|
| Distinct (%) | 2.9% |
| Missing | 12702 |
| Missing (%) | 64.4% |
| Memory size | 154.2 KiB |
Length
| Max length | 153 |
|---|---|
| Median length | 4 |
| Mean length | 9.4597292 |
| Min length | 4 |
Unique
| Unique | 100 ? |
|---|---|
| Unique (%) | 1.4% |
Sample
| 1st row | None |
|---|---|
| 2nd row | Auto-Keras |
| 3rd row | Google AutoML , Tpot , Auto-Keras , Auto-Sklearn , Auto_ml |
| 4th row | None |
| 5th row | Google AutoML |
| Value | Count | Frequency (%) |
| none | 5175 | |
| 1266 | 11.6% | |
| automl | 860 | 7.8% |
| auto-sklearn | 756 | 6.9% |
| 498 | 4.5% | |
| auto-keras | 465 | 4.2% |
| auto_ml | 279 | 2.5% |
| h20 | 277 | 2.5% |
| driverless | 277 | 2.5% |
| ai | 277 | 2.5% |
| Other values (6) | 831 | 7.6% |
Most occurring characters
| Value | Count | Frequency (%) |
| 10742 | ||
| o | 9182 | |
| e | 7608 | |
| n | 5931 | 8.9% |
| N | 5175 | 7.8% |
| t | 3201 | 4.8% |
| A | 2637 | 4.0% |
| u | 2360 | 3.6% |
| r | 2098 | 3.2% |
| a | 1945 | 2.9% |
| Other values (29) | 15481 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 66360 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 10742 | ||
| o | 9182 | |
| e | 7608 | |
| n | 5931 | 8.9% |
| N | 5175 | 7.8% |
| t | 3201 | 4.8% |
| A | 2637 | 4.0% |
| u | 2360 | 3.6% |
| r | 2098 | 3.2% |
| a | 1945 | 2.9% |
| Other values (29) | 15481 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 66360 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 10742 | ||
| o | 9182 | |
| e | 7608 | |
| n | 5931 | 8.9% |
| N | 5175 | 7.8% |
| t | 3201 | 4.8% |
| A | 2637 | 4.0% |
| u | 2360 | 3.6% |
| r | 2098 | 3.2% |
| a | 1945 | 2.9% |
| Other values (29) | 15481 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 66360 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 10742 | ||
| o | 9182 | |
| e | 7608 | |
| n | 5931 | 8.9% |
| N | 5175 | 7.8% |
| t | 3201 | 4.8% |
| A | 2637 | 4.0% |
| u | 2360 | 3.6% |
| r | 2098 | 3.2% |
| a | 1945 | 2.9% |
| Other values (29) | 15481 |
| Distinct | 454 |
|---|---|
| Distinct (%) | 6.5% |
| Missing | 12723 |
| Missing (%) | 64.5% |
| Memory size | 154.2 KiB |
Length
| Max length | 168 |
|---|---|
| Median length | 146 |
| Mean length | 24.700743 |
| Min length | 4 |
Unique
| Unique | 196 ? |
|---|---|
| Unique (%) | 2.8% |
Sample
| 1st row | Azure SQL Database |
|---|---|
| 2nd row | PostgresSQL, AWS Relational Database Service |
| 3rd row | MySQL, PostgresSQL |
| 4th row | MySQL |
| 5th row | MySQL |
| Value | Count | Frequency (%) |
| mysql | 3122 | |
| sql | 2857 | |
| microsoft | 2399 | |
| database | 2259 | |
| postgressql | 2160 | |
| server | 1852 | |
| sqlite | 1527 | 6.5% |
| none | 1245 | 5.3% |
| oracle | 1192 | 5.1% |
| aws | 1003 | 4.3% |
| Other values (8) | 3956 |
Most occurring characters
| Value | Count | Frequency (%) |
| 16578 | 9.6% | |
| e | 15690 | 9.1% |
| S | 13109 | 7.6% |
| r | 10809 | 6.3% |
| o | 10784 | 6.2% |
| s | 10072 | 5.8% |
| Q | 9666 | 5.6% |
| L | 9666 | 5.6% |
| a | 9560 | 5.5% |
| t | 9220 | 5.3% |
| Other values (26) | 57603 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 172757 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 16578 | 9.6% | |
| e | 15690 | 9.1% |
| S | 13109 | 7.6% |
| r | 10809 | 6.3% |
| o | 10784 | 6.2% |
| s | 10072 | 5.8% |
| Q | 9666 | 5.6% |
| L | 9666 | 5.6% |
| a | 9560 | 5.5% |
| t | 9220 | 5.3% |
| Other values (26) | 57603 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 172757 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 16578 | 9.6% | |
| e | 15690 | 9.1% |
| S | 13109 | 7.6% |
| r | 10809 | 6.3% |
| o | 10784 | 6.2% |
| s | 10072 | 5.8% |
| Q | 9666 | 5.6% |
| L | 9666 | 5.6% |
| a | 9560 | 5.5% |
| t | 9220 | 5.3% |
| Other values (26) | 57603 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 172757 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 16578 | 9.6% | |
| e | 15690 | 9.1% |
| S | 13109 | 7.6% |
| r | 10809 | 6.3% |
| o | 10784 | 6.2% |
| s | 10072 | 5.8% |
| Q | 9666 | 5.6% |
| L | 9666 | 5.6% |
| a | 9560 | 5.5% |
| t | 9220 | 5.3% |
| Other values (26) | 57603 |
Correlations
| approximately_how_many_individuals_are_responsible_for_data_science_workloads_at_your_place_of_business | approximately_how_much_money_have_you_spent_on_machine_learning_and_or_cloud_computing_products_at_your_work_in_the_past_5_years | does_your_current_employer_incorporate_machine_learning_methods_into_their_business | for_how_many_years_have_you_used_machine_learning_methods | have_you_ever_used_a_tpu_tensor_processing_unit | how_long_have_you_been_writing_code_to_analyze_data_at_work_or_at_school | select_the_title_most_similar_to_your_current_role_or_most_recent_title_if_retired | what_is_the_highest_level_of_formal_education_that_you_have_attained_or_plan_to_attain_within_the_next_2_years | what_is_the_size_of_the_company_where_you_are_employed | what_is_your_age_#_years | what_is_your_current_yearly_compensation_approximate_$_usd | what_is_your_gender | what_programming_language_would_you_recommend_an_aspiring_data_scientist_to_learn_first | which_categories_of_computer_vision_methods_do_you_use_on_a_regular_basis | which_of_the_following_natural_language_processing_nlp_methods_do_you_use_on_a_regular_basis | which_types_of_specialized_hardware_do_you_use_on_a_regular_basis | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| approximately_how_many_individuals_are_responsible_for_data_science_workloads_at_your_place_of_business | 1.000 | 0.155 | 0.244 | 0.112 | 0.034 | 0.118 | 0.107 | 0.057 | 0.302 | 0.038 | 0.108 | 0.014 | 0.025 | 0.000 | 0.055 | 0.036 |
| approximately_how_much_money_have_you_spent_on_machine_learning_and_or_cloud_computing_products_at_your_work_in_the_past_5_years | 0.155 | 1.000 | 0.173 | 0.158 | 0.073 | 0.153 | 0.091 | 0.046 | 0.103 | 0.096 | 0.201 | 0.036 | 0.037 | 0.052 | 0.051 | 0.089 |
| does_your_current_employer_incorporate_machine_learning_methods_into_their_business | 0.244 | 0.173 | 1.000 | 0.201 | 0.052 | 0.162 | 0.161 | 0.066 | 0.120 | 0.059 | 0.141 | 0.030 | 0.041 | 0.059 | 0.090 | 0.099 |
| for_how_many_years_have_you_used_machine_learning_methods | 0.112 | 0.158 | 0.201 | 1.000 | 0.089 | 0.475 | 0.180 | 0.161 | 0.035 | 0.185 | 0.165 | 0.046 | 0.048 | 0.089 | 0.096 | 0.109 |
| have_you_ever_used_a_tpu_tensor_processing_unit | 0.034 | 0.073 | 0.052 | 0.089 | 1.000 | 0.053 | 0.039 | 0.021 | 0.024 | 0.034 | 0.055 | 0.041 | 0.036 | 0.114 | 0.109 | 0.278 |
| how_long_have_you_been_writing_code_to_analyze_data_at_work_or_at_school | 0.118 | 0.153 | 0.162 | 0.475 | 0.053 | 1.000 | 0.188 | 0.152 | 0.058 | 0.271 | 0.205 | 0.048 | 0.066 | 0.069 | 0.078 | 0.077 |
| select_the_title_most_similar_to_your_current_role_or_most_recent_title_if_retired | 0.107 | 0.091 | 0.161 | 0.180 | 0.039 | 0.188 | 1.000 | 0.182 | 0.050 | 0.194 | 0.069 | 0.056 | 0.075 | 0.044 | 0.062 | 0.070 |
| what_is_the_highest_level_of_formal_education_that_you_have_attained_or_plan_to_attain_within_the_next_2_years | 0.057 | 0.046 | 0.066 | 0.161 | 0.021 | 0.152 | 0.182 | 1.000 | 0.067 | 0.186 | 0.086 | 0.080 | 0.051 | 0.065 | 0.000 | 0.036 |
| what_is_the_size_of_the_company_where_you_are_employed | 0.302 | 0.103 | 0.120 | 0.035 | 0.024 | 0.058 | 0.050 | 0.067 | 1.000 | 0.083 | 0.139 | 0.013 | 0.039 | 0.018 | 0.043 | 0.042 |
| what_is_your_age_#_years | 0.038 | 0.096 | 0.059 | 0.185 | 0.034 | 0.271 | 0.194 | 0.186 | 0.083 | 1.000 | 0.149 | 0.062 | 0.061 | 0.040 | 0.030 | 0.029 |
| what_is_your_current_yearly_compensation_approximate_$_usd | 0.108 | 0.201 | 0.141 | 0.165 | 0.055 | 0.205 | 0.069 | 0.086 | 0.139 | 0.149 | 1.000 | 0.059 | 0.040 | 0.055 | 0.050 | 0.045 |
| what_is_your_gender | 0.014 | 0.036 | 0.030 | 0.046 | 0.041 | 0.048 | 0.056 | 0.080 | 0.013 | 0.062 | 0.059 | 1.000 | 0.053 | 0.088 | 0.041 | 0.110 |
| what_programming_language_would_you_recommend_an_aspiring_data_scientist_to_learn_first | 0.025 | 0.037 | 0.041 | 0.048 | 0.036 | 0.066 | 0.075 | 0.051 | 0.039 | 0.061 | 0.040 | 0.053 | 1.000 | 0.060 | 0.078 | 0.063 |
| which_categories_of_computer_vision_methods_do_you_use_on_a_regular_basis | 0.000 | 0.052 | 0.059 | 0.089 | 0.114 | 0.069 | 0.044 | 0.065 | 0.018 | 0.040 | 0.055 | 0.088 | 0.060 | 1.000 | 0.139 | 0.174 |
| which_of_the_following_natural_language_processing_nlp_methods_do_you_use_on_a_regular_basis | 0.055 | 0.051 | 0.090 | 0.096 | 0.109 | 0.078 | 0.062 | 0.000 | 0.043 | 0.030 | 0.050 | 0.041 | 0.078 | 0.139 | 1.000 | 0.099 |
| which_types_of_specialized_hardware_do_you_use_on_a_regular_basis | 0.036 | 0.089 | 0.099 | 0.109 | 0.278 | 0.077 | 0.070 | 0.036 | 0.042 | 0.029 | 0.045 | 0.110 | 0.063 | 0.174 | 0.099 | 1.000 |
Missing values
Sample
| what_is_your_age_#_years | what_is_your_gender | in_which_country_do_you_currently_reside | what_is_the_highest_level_of_formal_education_that_you_have_attained_or_plan_to_attain_within_the_next_2_years | select_the_title_most_similar_to_your_current_role_or_most_recent_title_if_retired | what_is_the_size_of_the_company_where_you_are_employed | approximately_how_many_individuals_are_responsible_for_data_science_workloads_at_your_place_of_business | does_your_current_employer_incorporate_machine_learning_methods_into_their_business | what_is_your_current_yearly_compensation_approximate_$_usd | approximately_how_much_money_have_you_spent_on_machine_learning_and_or_cloud_computing_products_at_your_work_in_the_past_5_years | what_is_the_primary_tool_that_you_use_at_work_or_school_to_analyze_data | how_long_have_you_been_writing_code_to_analyze_data_at_work_or_at_school | what_programming_language_would_you_recommend_an_aspiring_data_scientist_to_learn_first | have_you_ever_used_a_tpu_tensor_processing_unit | for_how_many_years_have_you_used_machine_learning_methods | select_any_activities_that_make_up_an_important_part_of_your_role_at_work | who_what_are_your_favorite_media_sources_that_report_on_data_science_topics | on_which_platforms_have_you_begun_or_completed_data_science_courses | which_of_the_following_integrated_development_environments_id_es_do_you_use_on_a_regular_basis | which_of_the_following_hosted_notebook_products_do_you_use_on_a_regular_basis | what_programming_languages_do_you_use_on_a_regular_basis | what_data_visualization_libraries_or_tools_do_you_use_on_a_regular_basis | which_types_of_specialized_hardware_do_you_use_on_a_regular_basis | which_of_the_following_ml_algorithms_do_you_use_on_a_regular_basis | which_categories_of_ml_tools_do_you_use_on_a_regular_basis | which_categories_of_computer_vision_methods_do_you_use_on_a_regular_basis | which_of_the_following_natural_language_processing_nlp_methods_do_you_use_on_a_regular_basis | which_of_the_following_machine_learning_frameworks_do_you_use_on_a_regular_basis | which_of_the_following_cloud_computing_platforms_do_you_use_on_a_regular_basis | which_specific_cloud_computing_products_do_you_use_on_a_regular_basis | which_specific_big_data_analytics_products_do_you_use_on_a_regular_basis | which_of_the_following_machine_learning_products_do_you_use_on_a_regular_basis | which_automated_machine_learning_tools_or_partial_auto_ml_tools_do_you_use_on_a_regular_basis | which_of_the_following_relational_database_products_do_you_use_on_a_regular_basis | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 22-24 | Male | France | Master’s degree | Software Engineer | 1000-9,999 employees | 0 | I do not know | 30,000-39,999 | $0 (USD) | Basic statistical software (Microsoft Excel, Google Sheets, etc.), 0, -1, -1, -1, -1 | 1-2 years | Python | Never | 1-2 years | NaN | Twitter (data science influencers), Kaggle (forums, blog, social media, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc) | Coursera, DataCamp, Kaggle Courses (i.e. Kaggle Learn), Udemy | Jupyter (JupyterLab, Jupyter Notebooks, etc) , RStudio , PyCharm , MATLAB , Spyder | None | Python, R, SQL, Java, Javascript, MATLAB | Matplotlib | CPUs, GPUs | Linear or Logistic Regression | None | NaN | NaN | None | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | 40-44 | Male | India | Professional degree | Software Engineer | > 10,000 employees | 20+ | We have well established ML methods (i.e., models in production for more than 2 years) | 5,000-7,499 | > $100,000 ($USD) | Cloud-based data software & APIs (AWS, GCP, Azure, etc.), -1, -1, -1, -1, 0 | I have never written code | NaN | NaN | NaN | Analyze and understand data to influence product or business decisions, Build and/or run the data infrastructure that my business uses for storing, analyzing, and operationalizing data, Build prototypes to explore applying machine learning to new areas, Build and/or run a machine learning service that operationally improves my product or workflows | Kaggle (forums, blog, social media, etc), YouTube (Cloud AI Adventures, Siraj Raval, etc), Podcasts (Chai Time Data Science, Linear Digressions, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc) | Coursera, DataCamp, Kaggle Courses (i.e. Kaggle Learn), Udemy | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | 55-59 | Female | Germany | Professional degree | NaN | NaN | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | 40-44 | Male | Australia | Master’s degree | Other | > 10,000 employees | 20+ | I do not know | 250,000-299,999 | $10,000-$99,999 | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 0, -1 | 1-2 years | Python | Once | 2-3 years | NaN | Podcasts (Chai Time Data Science, Linear Digressions, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc), Slack Communities (ods.ai, kagglenoobs, etc) | Coursera, edX, DataCamp, University Courses (resulting in a university degree) | Jupyter (JupyterLab, Jupyter Notebooks, etc) , Visual Studio / Visual Studio Code | Microsoft Azure Notebooks | Python, R, SQL, Bash | Ggplot / ggplot2 , Matplotlib , Seaborn | CPUs, GPUs | Linear or Logistic Regression, Convolutional Neural Networks | Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI) | General purpose image/video tools (PIL, cv2, skimage, etc), Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc) | NaN | Scikit-learn , TensorFlow , Keras , RandomForest | Microsoft Azure | Azure Virtual Machines, Azure Container Service | Databricks, Microsoft Analysis Services | Azure Machine Learning Studio | None | Azure SQL Database |
| 4 | 22-24 | Male | India | Bachelor’s degree | Other | 0-49 employees | 0 | No (we do not use ML methods) | 4,000-4,999 | $0 (USD) | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 1, -1 | < 1 years | Python | Never | < 1 years | NaN | YouTube (Cloud AI Adventures, Siraj Raval, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Other | Other | Jupyter (JupyterLab, Jupyter Notebooks, etc) | Google Colab , Google Cloud Notebook Products (AI Platform, Datalab, etc) | Python, SQL | Matplotlib , Plotly / Plotly Express , Seaborn | CPUs, GPUs | Linear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc) | None | NaN | NaN | Scikit-learn , RandomForest, Xgboost , LightGBM | NaN | NaN | NaN | NaN | NaN | NaN |
| 5 | 50-54 | Male | France | Master’s degree | Data Scientist | 0-49 employees | 3-4 | We have well established ML methods (i.e., models in production for more than 2 years) | 60,000-69,999 | $10,000-$99,999 | Advanced statistical software (SPSS, SAS, etc.), -1, 0, -1, -1, -1 | 20+ years | Java | Never | 10-15 years | Build prototypes to explore applying machine learning to new areas, Do research that advances the state of the art of machine learning | YouTube (Cloud AI Adventures, Siraj Raval, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc) | None | RStudio , Other | None | Python, R | Ggplot / ggplot2 | CPUs, GPUs | Linear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc), Bayesian Approaches, Convolutional Neural Networks, Generative Adversarial Networks, Recurrent Neural Networks | Automated model selection (e.g. auto-sklearn, xcessiv), Automated hyperparameter tuning (e.g. hyperopt, ray.tune), Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI) | None | Word embeddings/vectors (GLoVe, fastText, word2vec), Encoder-decorder models (seq2seq, vanilla transformers) | Scikit-learn , TensorFlow , Keras , RandomForest, Xgboost , Caret | Amazon Web Services (AWS) | AWS Elastic Compute Cloud (EC2) | AWS Elastic MapReduce | RapidMiner | Auto-Keras | PostgresSQL, AWS Relational Database Service |
| 6 | 22-24 | Male | India | Master’s degree | Data Scientist | 50-249 employees | 20+ | We are exploring ML methods (and may one day put a model into production) | 10,000-14,999 | $100-$999 | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 2, -1 | 3-5 years | Python | 6-24 times | 2-3 years | Analyze and understand data to influence product or business decisions, Experimentation and iteration to improve existing ML models, Do research that advances the state of the art of machine learning | Kaggle (forums, blog, social media, etc), Course Forums (forums.fast.ai, etc), YouTube (Cloud AI Adventures, Siraj Raval, etc), Podcasts (Chai Time Data Science, Linear Digressions, etc), Journal Publications (traditional publications, preprint journals, etc) | Udacity, Coursera, edX, Kaggle Courses (i.e. Kaggle Learn), Udemy | Jupyter (JupyterLab, Jupyter Notebooks, etc) , Spyder , Notepad++ , Sublime Text | Kaggle Notebooks (Kernels) , Google Colab , Binder / JupyterHub | Python, R, Bash | Matplotlib , Plotly / Plotly Express , Bokeh , Seaborn | CPUs, GPUs | Linear or Logistic Regression, Dense Neural Networks (MLPs, etc), Convolutional Neural Networks, Recurrent Neural Networks | Automated data augmentation (e.g. imgaug, albumentations), Automated feature engineering/selection (e.g. tpot, boruta_py), Automated model selection (e.g. auto-sklearn, xcessiv), Automation of full ML pipelines (e.g. Google AutoML, H20 Driverless AI) | General purpose image/video tools (PIL, cv2, skimage, etc), Image segmentation methods (U-Net, Mask R-CNN, etc), Object detection methods (YOLOv3, RetinaNet, etc) | Word embeddings/vectors (GLoVe, fastText, word2vec), Encoder-decorder models (seq2seq, vanilla transformers) | Scikit-learn , TensorFlow , Keras , PyTorch | Google Cloud Platform (GCP) , Amazon Web Services (AWS) , Microsoft Azure | Google Compute Engine (GCE), AWS Lambda, Azure Virtual Machines | Google BigQuery, Databricks | SAS, Azure Machine Learning Studio, Google Cloud Machine Learning Engine | Google AutoML , Tpot , Auto-Keras , Auto-Sklearn , Auto_ml | MySQL, PostgresSQL |
| 7 | 22-24 | Female | United States of America | Bachelor’s degree | Data Scientist | > 10,000 employees | 20+ | We recently started using ML methods (i.e., models in production for less than 2 years) | 80,000-89,999 | $0 (USD) | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 3, -1 | 3-5 years | Python | Once | 3-4 years | Analyze and understand data to influence product or business decisions, Build prototypes to explore applying machine learning to new areas, Build and/or run a machine learning service that operationally improves my product or workflows | Hacker News (https://news.ycombinator.com/), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc) | Udemy, University Courses (resulting in a university degree) | Jupyter (JupyterLab, Jupyter Notebooks, etc) , Spyder | Microsoft Azure Notebooks , AWS Notebook Products (EMR Notebooks, Sagemaker Notebooks, etc) | Python | Matplotlib , Plotly / Plotly Express | CPUs | Linear or Logistic Regression, Decision Trees or Random Forests, Convolutional Neural Networks | None | General purpose image/video tools (PIL, cv2, skimage, etc), Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc) | NaN | Scikit-learn , TensorFlow , Keras , Spark MLib | NaN | NaN | NaN | NaN | NaN | NaN |
| 8 | 22-24 | Male | United States of America | Bachelor’s degree | Student | NaN | NaN | NaN | NaN | NaN | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 4, -1 | 3-5 years | Python | Never | 1-2 years | NaN | Kaggle (forums, blog, social media, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc) | Kaggle Courses (i.e. Kaggle Learn), University Courses (resulting in a university degree) | Jupyter (JupyterLab, Jupyter Notebooks, etc) , PyCharm , Atom | Google Colab | Python | Matplotlib , Seaborn | CPUs, GPUs | Linear or Logistic Regression, Decision Trees or Random Forests, Gradient Boosting Machines (xgboost, lightgbm, etc), Bayesian Approaches, Evolutionary Approaches, Dense Neural Networks (MLPs, etc), Convolutional Neural Networks | None | General purpose image/video tools (PIL, cv2, skimage, etc), Image classification and other general purpose networks (VGG, Inception, ResNet, ResNeXt, NASNet, EfficientNet, etc) | NaN | Scikit-learn , Xgboost , PyTorch , LightGBM | NaN | NaN | NaN | NaN | NaN | NaN |
| 9 | 55-59 | Male | Netherlands | Master’s degree | Other | 0-49 employees | 1-2 | We are exploring ML methods (and may one day put a model into production) | $0-999 | $100-$999 | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 5, -1 | 5-10 years | Python | Never | < 1 years | Other | Kaggle (forums, blog, social media, etc), Course Forums (forums.fast.ai, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc) | Coursera | Jupyter (JupyterLab, Jupyter Notebooks, etc) | None | Python, SQL | Matplotlib , D3.js , Seaborn | CPUs | Linear or Logistic Regression, Bayesian Approaches, Generative Adversarial Networks | None | None | NaN | Scikit-learn , PyTorch | None | None | None | None | None | MySQL |
| what_is_your_age_#_years | what_is_your_gender | in_which_country_do_you_currently_reside | what_is_the_highest_level_of_formal_education_that_you_have_attained_or_plan_to_attain_within_the_next_2_years | select_the_title_most_similar_to_your_current_role_or_most_recent_title_if_retired | what_is_the_size_of_the_company_where_you_are_employed | approximately_how_many_individuals_are_responsible_for_data_science_workloads_at_your_place_of_business | does_your_current_employer_incorporate_machine_learning_methods_into_their_business | what_is_your_current_yearly_compensation_approximate_$_usd | approximately_how_much_money_have_you_spent_on_machine_learning_and_or_cloud_computing_products_at_your_work_in_the_past_5_years | what_is_the_primary_tool_that_you_use_at_work_or_school_to_analyze_data | how_long_have_you_been_writing_code_to_analyze_data_at_work_or_at_school | what_programming_language_would_you_recommend_an_aspiring_data_scientist_to_learn_first | have_you_ever_used_a_tpu_tensor_processing_unit | for_how_many_years_have_you_used_machine_learning_methods | select_any_activities_that_make_up_an_important_part_of_your_role_at_work | who_what_are_your_favorite_media_sources_that_report_on_data_science_topics | on_which_platforms_have_you_begun_or_completed_data_science_courses | which_of_the_following_integrated_development_environments_id_es_do_you_use_on_a_regular_basis | which_of_the_following_hosted_notebook_products_do_you_use_on_a_regular_basis | what_programming_languages_do_you_use_on_a_regular_basis | what_data_visualization_libraries_or_tools_do_you_use_on_a_regular_basis | which_types_of_specialized_hardware_do_you_use_on_a_regular_basis | which_of_the_following_ml_algorithms_do_you_use_on_a_regular_basis | which_categories_of_ml_tools_do_you_use_on_a_regular_basis | which_categories_of_computer_vision_methods_do_you_use_on_a_regular_basis | which_of_the_following_natural_language_processing_nlp_methods_do_you_use_on_a_regular_basis | which_of_the_following_machine_learning_frameworks_do_you_use_on_a_regular_basis | which_of_the_following_cloud_computing_platforms_do_you_use_on_a_regular_basis | which_specific_cloud_computing_products_do_you_use_on_a_regular_basis | which_specific_big_data_analytics_products_do_you_use_on_a_regular_basis | which_of_the_following_machine_learning_products_do_you_use_on_a_regular_basis | which_automated_machine_learning_tools_or_partial_auto_ml_tools_do_you_use_on_a_regular_basis | which_of_the_following_relational_database_products_do_you_use_on_a_regular_basis | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 19707 | 18-21 | Male | Viet Nam | NaN | NaN | NaN | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 19708 | 25-29 | Female | India | Professional degree | Not employed | NaN | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc) | Coursera, DataCamp, Kaggle Courses (i.e. Kaggle Learn), LinkedIn Learning, University Courses (resulting in a university degree) | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 19709 | 25-29 | Prefer not to say | Austria | No formal education past high school | Data Scientist | 250-999 employees | 1-2 | We use ML methods for generating insights (but do not put working models into production) | 1,000-1,999 | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | Analyze and understand data to influence product or business decisions | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 19710 | 22-24 | Male | India | Bachelor’s degree | Data Scientist | 50-249 employees | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 19711 | 18-21 | Male | India | Master’s degree | Student | NaN | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | Kaggle (forums, blog, social media, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc) | Coursera | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 19712 | 50-54 | Male | Japan | NaN | NaN | NaN | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 19713 | 18-21 | Male | India | Bachelor’s degree | Other | 250-999 employees | 3-4 | I do not know | $0-999 | $0 (USD) | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 28, -1 | 1-2 years | NaN | NaN | NaN | NaN | Reddit (r/machinelearning, r/datascience, etc) | DataCamp, Udemy | Jupyter (JupyterLab, Jupyter Notebooks, etc) , RStudio , PyCharm , Visual Studio / Visual Studio Code , Spyder , Notepad++ , Sublime Text | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 19714 | 35-39 | Male | India | Master’s degree | Student | NaN | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | Kaggle (forums, blog, social media, etc), Course Forums (forums.fast.ai, etc), YouTube (Cloud AI Adventures, Siraj Raval, etc), Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc), Journal Publications (traditional publications, preprint journals, etc) | Coursera, Kaggle Courses (i.e. Kaggle Learn) | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 19715 | 25-29 | Male | India | Master’s degree | Statistician | 50-249 employees | 15-19 | We recently started using ML methods (i.e., models in production for less than 2 years) | 1,000-1,999 | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | Other | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| 19716 | 50-54 | Male | France | Bachelor’s degree | Software Engineer | > 10,000 employees | 20+ | We have well established ML methods (i.e., models in production for more than 2 years) | 60,000-69,999 | $0 (USD) | Local development environments (RStudio, JupyterLab, etc.), -1, -1, -1, 25, -1 | 3-5 years | Python | Never | 4-5 years | Build and/or run the data infrastructure that my business uses for storing, analyzing, and operationalizing data, Build prototypes to explore applying machine learning to new areas | Blogs (Towards Data Science, Medium, Analytics Vidhya, KDnuggets etc) | Coursera, edX, Udemy | Jupyter (JupyterLab, Jupyter Notebooks, etc) , Visual Studio / Visual Studio Code | IBM Watson Studio | Python, SQL, Java, Bash | Matplotlib | CPUs | Linear or Logistic Regression, Decision Trees or Random Forests | Automated model selection (e.g. auto-sklearn, xcessiv), Automated hyperparameter tuning (e.g. hyperopt, ray.tune) | NaN | NaN | Scikit-learn , Spark MLib | NaN | NaN | NaN | NaN | NaN | NaN |
Duplicate rows
Most frequently occurring
| what_is_your_age_#_years | what_is_your_gender | in_which_country_do_you_currently_reside | what_is_the_highest_level_of_formal_education_that_you_have_attained_or_plan_to_attain_within_the_next_2_years | select_the_title_most_similar_to_your_current_role_or_most_recent_title_if_retired | what_is_the_size_of_the_company_where_you_are_employed | approximately_how_many_individuals_are_responsible_for_data_science_workloads_at_your_place_of_business | does_your_current_employer_incorporate_machine_learning_methods_into_their_business | what_is_your_current_yearly_compensation_approximate_$_usd | approximately_how_much_money_have_you_spent_on_machine_learning_and_or_cloud_computing_products_at_your_work_in_the_past_5_years | what_is_the_primary_tool_that_you_use_at_work_or_school_to_analyze_data | how_long_have_you_been_writing_code_to_analyze_data_at_work_or_at_school | what_programming_language_would_you_recommend_an_aspiring_data_scientist_to_learn_first | have_you_ever_used_a_tpu_tensor_processing_unit | for_how_many_years_have_you_used_machine_learning_methods | select_any_activities_that_make_up_an_important_part_of_your_role_at_work | who_what_are_your_favorite_media_sources_that_report_on_data_science_topics | on_which_platforms_have_you_begun_or_completed_data_science_courses | which_of_the_following_integrated_development_environments_id_es_do_you_use_on_a_regular_basis | which_of_the_following_hosted_notebook_products_do_you_use_on_a_regular_basis | what_programming_languages_do_you_use_on_a_regular_basis | what_data_visualization_libraries_or_tools_do_you_use_on_a_regular_basis | which_types_of_specialized_hardware_do_you_use_on_a_regular_basis | which_of_the_following_ml_algorithms_do_you_use_on_a_regular_basis | which_categories_of_ml_tools_do_you_use_on_a_regular_basis | which_categories_of_computer_vision_methods_do_you_use_on_a_regular_basis | which_of_the_following_natural_language_processing_nlp_methods_do_you_use_on_a_regular_basis | which_of_the_following_machine_learning_frameworks_do_you_use_on_a_regular_basis | which_of_the_following_cloud_computing_platforms_do_you_use_on_a_regular_basis | which_specific_cloud_computing_products_do_you_use_on_a_regular_basis | which_specific_big_data_analytics_products_do_you_use_on_a_regular_basis | which_of_the_following_machine_learning_products_do_you_use_on_a_regular_basis | which_automated_machine_learning_tools_or_partial_auto_ml_tools_do_you_use_on_a_regular_basis | which_of_the_following_relational_database_products_do_you_use_on_a_regular_basis | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 27 | 18-21 | Male | India | NaN | NaN | NaN | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 30 |
| 74 | 22-24 | Male | India | NaN | NaN | NaN | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 17 |
| 20 | 18-21 | Male | India | Bachelor’s degree | Student | NaN | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 12 |
| 55 | 22-24 | Male | China | NaN | NaN | NaN | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 12 |
| 118 | 25-29 | Male | United States of America | NaN | NaN | NaN | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 10 |
| 132 | 30-34 | Male | Japan | NaN | NaN | NaN | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 10 |
| 21 | 18-21 | Male | India | Bachelor’s degree | NaN | NaN | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 9 |
| 67 | 22-24 | Male | India | Bachelor’s degree | NaN | NaN | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 8 |
| 53 | 22-24 | Male | China | Master’s degree | Student | NaN | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 7 |
| 99 | 25-29 | Male | China | NaN | NaN | NaN | NaN | NaN | NaN | NaN | -1, -1, -1, -1, -1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 7 |